Method for identifying polymorphisms

ABSTRACT

The present invention relates to methods for the detection of polymorphism in polynucleotides by using hybridization of fragments of segments of a polynucleotide suspected of containing a polymorphism with an oligonucleotide having a sequence complementary to a fragment identifying the polymorphism and subsequent detection of incorporated labels in the oligonucleotide-fragment duplex.

RELATED APPLICATIONS

This application is a divisional of U.S. application Ser. No.09/655,104, filed Sep. 9, 2000. Application Ser. No. 09/655,104 is acontinuation-in-part of U.S. application Ser. Nos. 09/394,467,09/394,457, 09/394,774 and 09/394,387, all of which were filed Sep. 10,1999, and are entitled “A METHOD FOR ANALYZING POLYNUCLEOTIDES.” Each ofthe forgoing applications claims the benefit of U.S. Provisional PatentApplication Ser. No. 60/102,724, filed Oct. 1, 1998, also entitled “AMETHOD FOR ANALYZING POLYNUCLEOTIDES.” All of the above applications areincorporated herein by reference in their entireties, including drawingsand tables.

FIELD OF THE INVENTION

The present invention relates generally to organic chemistry, analyticalchemistry, biochemistry, molecular biology, genetics, diagnostics andmedicine. In particular, it relates to a method for detectingpolymorphisms, in particular single nucleotide polymorphisms, inpolynucleotides.

BACKGROUND OF THE INVENTION

The following is offered as background information only and is notintended nor admitted to be prior art to the present invention.

DNA is the carrier of the genetic information of all living cells. Anorganism's genetic and physical characteristics, its genotype andphenotype, respectively, are controlled by precise nucleic acidsequences in the organism's DNA. The sum total of all of the sequenceinformation present in an organism's DNA is termed the organism's“genome.” The nucleic acid sequence of a DNA molecule consists of alinear polymer of four “nucleotides.” The four nucleotides aretripartite molecules, each consisting of (1) one of the fourheterocyclic bases, adenine (abbreviated “A”), cytosine (“C”), guanine(“G”) and thyrnine (“T”); (2) the pentose sugar derivative 2-deoxyribosewhich is bonded by its 1-carbon atom to a ring nitrogen atom of theheterocyclic bases; and (3) a monophosphate monoester formed between aphosphoric acid molecule and the 5′-hydroxy group of the sugar moiety.The nucleotides polymerize by the formation of diesters between the5′-phosphate of one nucleotide and the 3′-hydroxy group of anothernucleotide to give a single strand of DNA. In nature, two of thesesingle strand interact by hydrogen bonding between complementarynucleotides, A being complementary with T and C being complementary withG, to form “base-pairs” which results in the formation of the well-knownDNA “double helix” of Watson and Crick. RNA is similar to DNA exceptthat the base thymine is replaced with uracil (“U”) and the pentosesugar is ribose itself rather than deoxyribose. In addition, RNA existsin nature predominantly as a single strand; i.e., two strands do notnormally combine to form a double helix.

When referring to sequences of nucleotides in a polynucleotide, it iscustomary to use the abbreviation for the base; i.e., A, C, G, and T (orU) to represent the entire nucleotide containing that base. For example,a polynucleotide sequence denoted as “ACG” means that an adeninenucleotide is bonded through a phosphate ester linkage to a cytosinenucleotide that is bonded through another phosphate ester linkage to aguanine nucleotide. If the polynucleotide being described is DNA, thenit is understood that “A” refers to an adenine nucleotide that containsa deoxyribose sugar. If there is any possibility of ambiguity, the “A”of a DNA molecule can be designated “deoxyA” or simply “dA.” The same istrue for C and G. Since T occurs only in DNA and not RNA, there can beno ambiguity so there is no need to refer to deoxyT or dT.

As a rough approximation, it can be said that the number of genes anorganism has is proportional to the organism's phenotypic complexity;i.e., the number of genome products necessary to replicate the organismand allow it to function. The human genome, presently considered one ofthe most complex, consists of approximately 60,000-100,000 genes andabout three billion three hundred million base pairs. Each of thesegenes codes for an RNA, most of which in turn encodes a particularprotein which performs a specific biochemical or structural function: Avariance, also known as a polymorphism or mutation, in the genetic codeof any one of these genes may result in the production of a geneproduct, usually a protein or an RNA, with altered biochemical activityor with no activity at all. This can result from as little change as anaddition, deletion or substitution (transition or transversion) of asingle nucleotide in the DNA comprising a particular gene that issometimes referred to as a “single nucleotide polymorphism” or “SNP. Theconsequence of such a mutation in the genetic code ranges from harmlessto debilitating to fatal. There are presently over 6700 human disordersbelieved to have a genetic component. For example, hemophilia,Alzheimer's disease, Huntington's disease, Duchenne muscular dystrophyand cystic fibrosis are known to be related to variances in thenucleotide sequence of the DNA comprising certain genes. In addition,evidence is being amassed suggesting that changes in certain DNAsequences may predispose an individual to a variety of abnormalconditions such as obesity, diabetes, cardiovascular disease, centralnervous system disorders, auto-immune disorders and cancer. Variationsin DNA sequence of specific genes have also been implicated in thedifferences observed among patients in their responses to, for example,drugs, radiation therapy, nutritional status and other medicalinterventions. Thus, the ability to detect DNA sequence variances in anorganism's genome is an important aspect of the inquiry intorelationships between such variances and medical disorders and responsesto medical interventions. Once an association has been established, theability to detect the variance(s) in the genome of a patient can be anextremely useful diagnostic tool. It may even be possible, using earlyvariance detection, to diagnose and potentially treat, or even prevent,a disorder before the disorder has physically manifested itself.Furthermore, variance detection can be a valuable research tool in thatit may lead to the discovery of genetic bases for disorders the cause ofwhich were hitherto unknown or thought to be other than genetic.Variance detection may also be useful for guiding the selection of anoptimal therapy where there is a difference in response among patientsto one or more proposed therapies.

While the benefits of being able to detect variances in the genetic codeare clear, the practical aspects of doing so are daunting: it isestimated that sequence variations in human DNA occur with a frequencyof about 1 in 100 nucleotides when 50 to 100 individuals are compared.Nickerson, D. A., Nature Genetics, 1998, 223-240. This translates to asmany as thirty million variances in the human genome. Not all, in factvery few, of these variances have any measurable effect on the physicalwell being of humans. Detecting these 30 million variances and thendetermining which of them are relevant to human health is clearly aformidable task.

In addition to variance detection, knowledge of the complete nucleotidesequence of an organism's genome would contribute immeasurably to theunderstanding of the organism's overall biology, i.e., it would lead tothe identification of every gene product, its organization andarrangement in the organism's genome, the sequences required forcontrolling gene expression (i.e., production of each gene product) andreplication. In fact, the quest for such knowledge and understanding isthe raison d'être for the Human Genome Project, an international effortaimed at sequencing the entire human genome. Once the sequence of asingle genome is available, whatever the organism, it then becomesuseful to obtain the partial or complete sequence of other organisms ofthat species, particularly those organisms within the species thatexhibit different characteristics, in order to identify DNA sequencedifferences that correlate with the different characteristics. Suchdifferent characteristics may include, for microbial organisms,pathogenicity on the negative side or the ability to produce aparticular polymer or to remediate pollution on the positive side. Adifference in growth rate, nutrient content or pest resistance is apotential difference that mightbe observed among plants. Even amonghuman beings, a difference in disease susceptibility or response to aparticular therapy might relate to a genetic, i.e., DNA sequence,variation. As a result of the enormous potential utility to be realizedfrom DNA sequence information, in particular, identification of DNAsequence variances between individuals of the same species, the demandfor rapid, inexpensive, automated DNA sequencing and variance detectionprocedurescan be expected to increase dramatically in the future.

Once the DNA sequence of a DNA segment; e.g., a gene, a cDNA or, on alarger scale, a chromosome or an entire genome, has been determined, theexistence of sequence variances in that DNA segment among members of thesame species can be explored. Complete DNA sequencing is the definitiveprocedure for accomplishing this task. Thus, it is possible to determinethe complete sequence of a copy of a DNA segment obtained from adifferent member of the specie and simply compare that complete sequenceto the one previously obtained. However, current DNA sequencingtechnology is costly, time consuming and, in order to achieve highlevels of accuracy, must be highly redundant. Most major sequencingprojects require a 5- to 10-fold coverage of each nucleotide to reach anacceptable error rate of 1 in 2,000 to 1 in 10,000 bases. In addition,DNA sequencing is an inefficient way to detect variances. For example, avariance between any two copies of a gene, for example when twochromosomes are being compared, may occur as infrequently as once in1,000 or more bases. Thus, only a small portion of the sequence is ofinterest, i.e., that portion in which the variance exists. However, iffull sequencing is employed, a tremendous number of nucleotides have tobe sequenced to arrive at the desired information involving theaforesaid small portion. For example, consider a comparison of tenversions of a 3,000 nucleotide DNA sequence for the purpose ofdetecting, say, four variances among them. Even if only a 2-foldredundancy is employed (each strand of the double-stranded 3,000nucleotide DNA segment from each individual is sequenced once), 60,000nucleotides would have to be sequenced (10×3,000×2). In addition, it ismore than likely that problem areas will be encountered in thesequencing requiring additional runs with new primers; thus, the projectcould engender the sequencing of as many as 100,000 nucleotides todetermine four variances. A variety of procedures have been developedover the past 15 years to identify sequence differences and to providesome information about the location of the variant sites (Table 1).Using such a procedure, it would only be necessary to sequence fourrelatively short portions of the 3000 nt (nucleotide) sequence.Furthermore, only a few samples would have to be sequenced in eachregion because each variance produces a characteristic change (Table 1)so, if, for example, 22 of 50 samples exhibit a such a characteristicchange with a variation detection procedure, then sequencing as few asfour samples of the 22 would provide information on the other 18. Thelength of the segments that require sequencing could, depending on thevariance detection procedure employed, be as short as 50-100 nt. Thus,the scale of the sequencing project could be reduced to: 4 (sites)×50(nt per site)×2 (strands from each individual)×2 (individuals per site)or only about 800 nucleotides. This amounts to about 1% of thesequencing required in the absence of a preceding variance detectionstep.

As presently practiced, the technique for determining the fullnucleotide sequence of a polynucleotide and that for detectingpreviously unknown variances or mutations in related polynucleotidesends up being the same; that is, even when the issue is the presence orabsence of a single nucleotide variance between related polynucleotides,the complete sequences of at least a segment of the relatedpolynucleotides is determined and then compared. The only difference isthat a variance detection procedure such as those described in Table 1may be employed as a first step to reduce the amount of completesequencing necessary in the detection of unknown variances. TABLE 1Summary of commonly used methods for discovery of DNA sequencevariation. At the bottom the physical basis or output of each method isrepresented diagrammatically. The electro- phoretic sequencing methodsinclude a variety of enzymatic procedures for generating partialsequence ladders (for example, use of UTP and uracil glycosylase, orexonuclease degradation in the presence of boronated nucleotides.Methods for genotyping (i.e. testing specifically for the presence of apreviously identified polymorphism) include many of those listed aboveplus others.

The two classical methods for carrying out complete nucleotidesequencing are the Maxam and Gilbert chemical procedure (Proc. Nat.Acad. Sci. USA, 74, 560-564 (1977)) and the Sanger, et al.,chain-terminating procedure (Proc. Nat. Acad. Sci. USA, 74, 5463-5467(1977)).

The Maxam-Gilbert method of complete nucleotide sequencing involvesend-labeling a DNA molecule with, for example, ³²P, followed by one oftwo discrete reaction sequences involving two reactions each; i.e., fourreactions overall. One of these reaction sequences involves theselective methylation of the purine nucleotides guanine (G) and adenine(A) in the polynucleotide being investigated which, in most instances,is an isolated naturally-occurring polynucleotide such as DNA. The N7position of guanine methylates approximately five times as rapidly asthe N3 position of adenine. When heated in the presence of aqueous base,the methylated bases are lost and a break in the polynucleotide chainoccurs. The reaction is more effective with methylated guanine than withmethylated adenine so, when the reaction product is subjected toelectrophoresis on polyacrylamide gel plates, G cleavage residues aredark and A cleavage residues are light. This pattern can be reversed byusing acid instead of heat to release the methylated bases. That is,using acid, the A cleavage residues show up dark on electrophoresis andthe G cleavage residues show up light.

The second set of reaction sequences in the Maxam-Gilbert approachidentifies cytosine and thymine cleavage residues. That is, thepyrimidine bases of which DNA is comprised, cytosine (C) and thymine(T), are, under the Maxam-Gilbert approach, differentiated by treatmentof the isolated naturally-occurring polynucleotide, with hydrazine whichreacts equally effectively with either base except in the presence of ahigh salt concentration where it reacts only with cytosine. Thus,depending on the conditions used, two series of bands can be generatedon electrophoresis; in low salt, both C and T will be cleaved so thebands represent C+T; in high salt only C is cleaved so the bands willshow C only.

Thus, four chemical reactions followed by electrophoretic analysis ofthe resulting end-labeled ladder of cleavage products will reveal theexact nucleotide sequence of a DNA molecule. It is key to theMaxam-Gilbert sequencing method that only partial cleavage, on the orderof 1-2% at each susceptible position, occurs. This is becauseelectrophoresis separates fragments by size. To be meaningful, thefragments produced should, on the average, represent a singlemodification and cleavage per molecule. Then, when the fragments of allfour reactions are aligned according to size, the exact sequence of thetarget DNA can be determined.

The Sanger method for determining complete nucleotide sequences consistsof preparing four series of base-specifically chain-terminated, labeledDNA fragments by enzymatic polymerization. As in the Maxam-Gilbertprocedure, four separate reactions can be performed or, if labeleddideoxynucleotide terminators are used, the reactions can all be carriedout in the same test tube. In the Sanger method each of the fourreaction mixtures contains the same oligonucleotide template (either asingle- or a double-stranded DNA), the four nucleotides, A, G, C and T(one of which may be labeled), a polymerase and a primer, the polymeraseand primer being present to effect the polymerization of the nucleotidesinto a complement of the template oligonucleotide. To one of the fourreaction mixtures is added an empirically determined amount of thedideoxy derivative of one of the nucleotides. A small amount of thedideoxy derivative of one of the remaining three nucleotides is added toa second reaction mixture, and so on, resulting in four reactionmixtures each containing a different dideoxy nucleotide. The dideoxyderivatives, by virtue of their missing 3′-hydroxyl groups, terminatesthe enzymatic polymerization reaction upon incorporation into thenascent oligonucleotide chain. Thus, in one reaction mixture,containing, say, dideoxyadenosine triphosphate (ddATP), a series ofoligonucleotide fragments are produced all ending in ddA which whenresolved by electrophoresis produce a series of bands corresponding tothe size of the fragment created up to the point that thechain-terminating ddA became incorporated into the polymerizationreaction. Corresponding ladders of fragments can be obtained from eachof the other reaction mixtures in which the oligonucleotide fragmentsend in C, G and T. The four sets of fragments create a “sequenceladder,” each rung of which represents the next nucleotide in thesequence of bases comprising the subject DNA. Thus, the exact nucleotidesequence of the DNA can simply be read off the electrophoresis gel plateafter autoradiography or computer analysis of chromatograms in the caseof an automated DNA sequencing instrument. As mentioned above,dye-labeled chain terminating dideoxynucleotides and modifiedpolymerases that efficiently incorporate modified nucleotides representimproved method for chain-terminating sequencing.

Both the Maxam-Gilbert and Sanger procedures have their shortcomings.They are both time-consuming, labor-intensive (particularly with regardto the Maxam-Gilbert procedure which has not been automated like theSanger procedure), expensive (e.g., the most optimized versions of theSanger procedure require very expensive reagents) and require a fairdegree of technical expertise to assure proper operation and reliableresults. Furthermore, the Maxam-Gilbert procedure suffers from a lack ofspecificity of the modification chemistry that can result in artifactualfragments resulting in false ladder readings from the gel plate. TheSanger method, on the other hand, is susceptible to template secondarystructure formation that can cause interference in the polymerizationreaction. This causes terminations of the polymerization at sights ofsecondary structure (called “stops”) which can result in erroneousfragments appearing in the sequence ladder rendering parts of thesequence unreadable, although this problem is ameliorated by the use ofdye labeled dideoxy terminator. Furthermore, both sequencing methods aresusceptible to “compressions,” another result of DNA secondary structurewhich can affect fragment mobility during electrophoresis therebyrendering the sequence ladder unreadable or subject to erroneousinterpretation in the vicinity of the secondary structure. In addition,both methods are plagued by uneven intensity of the ladder and bynon-specific background interference. These concerns are magnified whenthe issue is variance detection. In order to discern a single nucleotidevariance, the procedure employed must be extremely accurate, a “mistake”in reading one nucleotide can result in a false positive; i.e., anindication of a variance where none exists. Neither the Maxam-Gilbertnor the Sanger procedures are capable of such accuracy in a single run.In fact, the frequency of errors in a “one pass” sequencing experimentis equal to or greater than 1%, which is on the order of ten times thefrequency of actual DNA variances when any two versions of a sequenceare compared. The situation can be ameliorated somewhat by performingmultiple runs (usually in the context of a “shotgun” sequencingprocedure) for each polynucleotide being compared, but this simplyincreases cost in terms of equipment, reagents, manpower and time. Thehigh cost of sequencing becomes even less acceptable when one considersthat it is often not necessary when looking for nucleotide sequencevariances among related polynucleotides to determine the completesequence of the subject polynucleotides or even the exact nature of thevariance (although, as will be seen, in some instances even this isdiscernable using the method of this invention); detection of thevariance alone may be sufficient.

While not avoiding all of the problems associated with the Maxam-Gilbertand Sanger procedures, several techniques have been devised to at leastmake one or the other of the procedures more efficient. One suchapproach has been to develop ways to circumvent slab gelelectrophoresis, one of the most time-consuming steps in the procedures.For instance, in U.S. Pat. Nos. 5,003,059 and 5,174,962, the Sangermethod is employed; however, the dideoxy derivative of each of thenucleotides used to terminate the polymerization reaction is uniquelytagged with an isotope of sulfur, ³²S, ³³S, ³⁴S or ³⁶S. Once thepolymerization reactions are complete, the chain terminated sequencesare separated by capillary zone electrophoresis, which, compared to slabgel electrophoresis, increases resolution, reduces run time and allowsanalysis of very small samples. The separated chain terminated sequencesare then combusted to convert the incorporated isotopic sulfur toisotopic sulfur dioxides (³²S O₂, ³³SO₂, ³⁴SO₂ and ³⁶SO₂). The isotopicsulfur dioxides are then subjected to mass spectrometry. Since eachisotope of sulfur is uniquely related to one of the four sets ofbase-specifically chain terminated fragments, the nucleotide sequence ofthe subject DNA can be determined from the mass spectrogram.

Another method, disclosed in U.S. Pat. No. 5,580,733, also incorporatesthe Sanger technique but eliminates gel electrophoresis altogether. Themethod involves taking each of the four populations of base-specificchain-terminated oligonucleotides from the Sanger reactions and forminga mixture with a visible laser light absorbing matrix such as3-hydroxypicolinic acid. The mixtures are then illuminated with visiblelaser light and vaporized, which occurs without further fragmentation ofthe chain-terminated nucleic acid fragments. The vaporized moleculesthat are charged are then accelerated in an electric filed and the massto charge (m/z) ratio of the ionized molecules determined bytime-of-flight mass spectrometry (TOF-MS). The molecular weights arethen aligned to determine the exact sequence of the subject DNA. Bymeasuring the mass difference between successive fragments in each ofthe mixtures, the lengths of fragments terminating in A, G, C or T canthen be inferred. A significant limitation of current MS instruments isthat polynucleotide fragments greater than 100 nucleotides in length(with many instruments, 50 nucleotides) cannot be efficiently detectedin routine use, especially if the fragments are part of a complexmixture. This severe limitation on the size of fragments that can beanalyzed has limited the development of polynucleotide analysis by MS.Thus, there is a need for a procedure that adapts large polynucleotides,such as DNA, to the capabilities of current MS instruments. The presentinvention provides such a procedure.

A further approach to nucleotide sequencing is disclosed in U.S. Pat.No. 5,547,835. Again, the starting point is the Sanger sequencingstrategy. The four base specific chain-terminated series of fragmentsare “conditioned” by, for example, purification, cation exchange and/ormass modification. The molecular weights of the conditioned fragmentsare then determined by mass spectrometry and the sequence of thestarting nucleic acid is determined by aligning the base specificterminated fragments according to molecular weight.

Each of the above methods involves complete Sanger sequencing of apolynucleotide prior to analysis by mass spectrometry. To detect geneticmutations; i.e., variances, the complete sequence can be compared to aknown nucleotide sequence. Where the sequence is not known, comparisonwith the nucleotide sequence of the same DNA isolated from another ofthe same organisms which does not exhibit the abnormalities seen in thesubject organism will likewise reveal mutations. This approach, ofcourse, requires running the Sanger procedure twice; i.e., eightseparate reactions. In addition, if a potential variance is detected,the entire procedure would in most instances be run again, sequencingthe opposite strand using a different primer to make sure that a falsepositive had not been obtained. When the specific nucleotide variance ormutation related to a particular disorder is known, there are a widevariety of known methods for detecting a variance without completesequencing. For instance, U.S. Pat. No. 5,605,798 describes such amethod. The method involves obtaining a nucleic acid molecule containingthe target sequence of interest from a biological sample, optionallyamplifying the target sequence, and then hybridizing the target sequenceto a detector oligonucleotide which is specifically designed to becomplementary to the target sequence. Either the detectoroligonucleotide or the target sequence is “conditioned” by massmodification prior to hybridization. Unhybridized detectoroligonucleotide is removed and the remaining reaction product isvolatilized and ionized. Detection of the detector oligonucleotide bymass spectrometry indicates the presence of the target nucleic acidsequence in the biological sample and thus confirms the diagnosis of thevariance-related disorder.

Variance detection procedures can be divided into two general categoriesalthough there is a considerable degree of overlap. One category, thevariance discovery procedures, is useful for examining DNA segments forthe existence, location and characteristics of new variances. Toaccomplish this, variance discovery procedures may be combined with DNAsequencing.

The second group of procedures, variance typing (sometimes referred toas genotyping) procedures, are useful for repetitive determination ofone or more nucleotides at a particular site in a DNA segment when thelocation of a variance or variances has previously been identified andcharacterized. In this type of analysis, it is often possible to designa very sensitive test of the status of a particular nucleotide ornucleotides. This technique, of course, is not well suited to thediscovery of new variances.

As note above, Table 1 is a list of a number of existing techniques fornucleotide examination. The majority of these are used primarily in newvariance determination. There are a variety of other methods, not shown,for gene typing. Like the Maxam-Gilbert and Sanger sequencingprocedures, these techniques are generally time-consuming, tedious andrequire a relatively high skill level to achieve the maximum degree ofaccuracy possible from each procedure. Even then, some of the techniqueslisted are, at best, inherently less accurate than would be desirable.

The methods of Table 1, though primarily devised for variance discovery,can also be used when a variant nucleotide has already been identifiedand the goal is to determine its status in one or more unknown DNAsamples (variance typing or genotyping). Some of the methods that havebeen developed specifically for genotyping include: (1) primer extensionmethods in which dideoxynucleotide termination of the primer extensionreaction occurs at the variant site generating extension products ofdifferent length or with different terminal nucleotides, which can thenbe determined by electrophoresis, mass spectrometry or fluorescence in aplate reader; (2) hybridization methods in which oligonucleotidescorresponding to the two possible sequences at a variant site areattached to a solid surface and hybridized with probes from the unknownsample; (3) restriction fragment length polymorphism analysis, wherein arestriction endonuclease recognition site includes the polymorphicnucleotide in such a manner that the site is cleavable with one variantnucleotide but not another; (4) methods such as “TaqMan” involvingdifferential hybridization and consequent differential 5′ endonucleasedigestion of labeled oligonucleotide probes in which there isfluorescent resonance energy transfer (FRET) between two fluors on theprobe that is abrogated by nuclease digestion of the probe; (5) otherFRET based methods involving labeled oligonucleotide probes calledmolecular beacons which exploit allele specific hybridization; (6)ligation dependent methods that require enzymatic ligation of twooligonucleotides across a polymorphic site that is perfectly matched toonly one of them; and, (7) allele specific oligonucleotide priming in apolymerase chain reaction (PCR). U. Landegren, et al., 1998, ReadingBits of Genetic Information: Methods for Single-nucleotide PolymorphismAnalysis, Genome Research 8(8):769-76.

When complete sequencing of large templates such as the entire genome ofa virus, a bacterium or a eukaryote (i.e., higher organisms includingman) or the repeated sequencing of a large DNA region or regions fromdifferent strains or individuals of a given species for purposes ofcomparison is desired, it becomes necessary to implement strategies formaking libraries of templates for DNA sequencing. This is becauseconventional chain terminating sequencing (i.e., the Sanger procedure)is limited by the resolving power of the analytical procedure used tocreate the nucleotide ladder of the subject polynucleotide. For gels,this resolving power is approximately 500-800 nt at a time. For massspectrometry, the limitation is the length of a polynucleotide that canbe efficiently vaporized prior to detection in the instrument. Althoughlarger fragments have been analyzed by highly specialized procedures andinstrumentation, presently this limit is approximately 50-60 nt.However, in large scale sequencing projects such as the Human GenomeProject, “markers” (DNA segments of known chromosomal location whosepresence can be relatively easily ascertained by the polymer chainreaction (PCR) technique and which, therefore, can be used as a point ofreference for mapping new areas of the genome) are currently about 100kilobases (Kb) apart. The markers at 100 Kb intervals must be connectedby efficient sequencing strategies. If the analytical method used is gelelectrophoresis, then to sequence a 100 kb stretch of DNA would requirehundreds of sequencing reactions. A fundamental question which must beaddressed is how to divide up the 100 kB segment (or whatever size isbeing dealt with) to optimize the process; i.e., to minimize the numberof sequencing reactions and sequence assembly work necessary to generatea complete sequence with the desired level of accuracy. A key issue inthis regard is how to initially fragment the DNA in such a manner thatthe fragments, once sequenced, can be correctly reassembled to recreatethe full-length target DNA. Presently, two general approaches provideboth sequence-ready fragments and the information necessary to recombinethe sequences into the full-length target DNA: “shotgun sequencing”(see, e.g., Venter, J. C., et al., Science, 1998, 280:1540-1542; Weber,J. L. and Myers, E. W., Genome Research, 1997, 7:401409; Andersson, B.et al., DNA Sequence, 1997, 7:63-70) and “directed DNA sequencing” (see,e.g., Voss, H., et al., Biotechniques, 1993, 15:714-721; Kaczorowski,T., et al., Anal. Biochem., 1994, 221:127-135; Lodhi, M. A., et al.,Genome Research, 1996, 6:10-18).

Shotgun sequencing involves the creation of a large library of randomfragments or “clones” in a sequence-ready vector such as a plasmid orphagemid. To arrive at a library in which all portions of the originalsequence are relatively equally represented, DNA that is to be shotgunsequenced is often fragmented by physical procedures such as sonicationwhich has been shown to produce nearly random fragmentation. Clones arethen selected at random from the shotgun library for sequencing. Thecomplete sequence of the DNA is then assembled by identifyingoverlapping sequences in the short (approx. 500 nt) shotgun sequences.In order to assure that the entire target region of the DNA isrepresented among the randomly selected clones and to reduce thefrequency of errors (incorrectly assigned overlaps), a high degree ofsequencing redundancy is necessary; for example, 7 to 10-fold. Even withsuch high redundancy, additional sequencing is often required to fillgaps in the coverage. Even then, the presence of repeat sequences suchas Alu (a 300 base-pair sequence which occurs in 500,000-1,000,000copies per haploid genome) and LINES (“Long INterspersed DNA sequenceElements” which can be 7,000 bases long and may be present in as many as100,000 copies per haploid genome), either of which may occur indifferent locations of multiple clones, can render DNA sequencere-assembly problematic. For instance, different members of thesesequence families can be over 90% identical which can sometimes make itvery difficult to determine sequence relationships on opposite sides ofsuch repeats. Figure X illustrates the difficulties of the shotgunsequencing approach in a hypothetical 10 kb sequence modeled after thesequence reported in Martin-Gallardo, et al., Nature Genetics, (1992)1:34-39.

Directed DNA sequencing, the second general approach, also entailsmaking a library of clones, often with large inserts (e.g., cosmid, P1,PAC or BAC libraries). In this procedure, the location of the clones inthe region to be sequenced is then mapped to obtain a set of clones thatconstitutes a minimum-overlap tiling path spanning the region to besequenced. Clones from this minimal set are then sequenced by proceduressuch as “primer walking” (see, e.g., Voss, supra). In this procedure,the end of one sequence is used to select a new sequencing primer withwhich to begin the next sequencing reaction, the end of the secondsequence is used to select the next primer and so on. The assembly of acomplete DNA is easier by direct sequencing and less sequencingredundancy is required since both the order of clones and thecompleteness of coverage is known from the clone map. On the other hand,assembling the map itself requires significant effort. Furthermore, thespeed with which new sequencing primers can be synthesized and the costof doing so is often a limiting factor with regard to primer walking.While a variety of methods for simplifying new primer construction haveaided in this process (see, e.g. Kaczorowski, et al. and Lodhi, et al.,surra), directed DNA sequencing remains a valuable but often expensiveand slow procedure.

Most large-scale sequencing projects employ aspects of both shotgunsequencing and directed sequencing. For example, a detailed map might bemade of a large insert library (e.g., BACs) to identify a minimal set ofclones which gives complete coverage of the target region but thensequencing of each of the large inserts is carried out by a shotgunapproach; e.g., fragmenting the large insert and re-cloning thefragments in a more optimal sequencing vector (see, e.g., Chen, C. N.,Nucleic Acids Research, 1996, 24:4034-4041). The shotgun and directedprocedures are also used in a complementary manner in which specificregions not covered by an initial shotgun experiment are subsequentlydetermined by directed sequencing.

Thus, there are significant limitations to both the shotgun and directedsequencing approaches to complete sequencing of large molecules such asthat required in genomic DNA sequencing projects. However, bothprocedures would benefit if the usable read length of contiguous DNA wasexpanded from the current 500-800 nt which can be effectively sequencedby the Sanger method. For example, directed sequencing could besignificantly improved by reducing the need for high resolution maps,which could be achieved by longer read lengths, which in turn wouldpermit greater distances between landmarks.

A major limitation of current sequencing procedures is the high errorrate (Kristensen, T., et al, DNA Sequencing, 2:243-346,1992; Kurshid, F.and Beck, S., Analytical Biochemistry, 208:138-143, 1993; Fichant, G. A.and Quentin, Y., Nucleic Acid Research, 23:2900-2908, 1995). It is wellknown that many of the errors associated with the Maxam-Gilbert andSanger procedures are systematic; i.e., the errors are not random;rather, they occur repeatedly. To avoid this, two mechanisticallydifferent sequencing methods may be used so that the systematic errorsin one may be detected and thus corrected by the second and visa versa.Since a significant fraction of the cost of current sequencing methodsis associated with the need for high redundancy to reduce sequencingerrors, the use of two procedures can reduce the overall cost ofobtaining highly accurate DNA sequence.

The production and/or chemical cleavage of polynucleotides composed ofribonucleotides and deoxyribonucleotides has been previously described.In particular, mutant polymerases that incorporate both ribonucleotidesand deoxyribonucleotides into a polynucleotide have been described.Production of mixed ribo- and deoxyribo-containing polynucleotides bypolymerization has been described; and generation of sequence laddersfrom such mixed polynucleotides, exploiting the well known lability ofthe ribo sugar to chemical base, has been described.

The use of such procedures, however, have been limited to: (i)polynucleotides where one ribonucleotide and three deoxyribonucleotidesare incorporated; (ii) cleavage at ribonucleotides is effected usingchemical base, (iii) only partial cleavage of the ribonucleotidecontaining polynucleotides is pursued, and (iv) the utility of theprocedure is confined to production of sequence ladders, which areresolved electrophoretically.

In addition, the chemical synthesis of polynucleotide primers containinga single ribonucleotide, which at a subsequent step is substantiallycompletely cleaved by chemical base, has been reported. The size of aprimer extension product is then determined by mass spectrometry orother methods.

An extension of nucleic acid sequence determination is the rapididentification of polymorphisms or sequence variations withinpolynucleotide regions. Assays for single nucleotide polymorphisms,SNPs, attempt to discriminate between two DNA sequences that differ at asingle base position. Hybridization based methods for accomplishing thistake advantage of the fact that a probe sequence that is exactlycomplementary to a test sequence will hybridize stringently in a“perfectly matched” duplex, whereas a probe/test sequence duplexcontaining one or more mis-matched base-pairs will either not hybridizeat all or will hybridize less stringently. Thus, if a probe sequencewere complementary to the sequence of one allele of a SNP, the probewould be expected to hybridize more stringently to that allele that toan alternate allele, which carries a single-base mis-match relative tothe probe.

A number of different nucleic acid hybridization assays have beendescribed which utilize solid supports. One such group of assaysinvolves oligonucleotide probes that are attached to a solid matrix,such as a microchip, capillary tube, glass-slide or microbead. Thismethod is the subject of U.S. Pat. Nos. 5,858,659; 5,981,176; 6,045,996;5,578,458 and 5,759,779. SNPs are detected by the difference instringency of hybridization of the probe to sequences that include theSNP compared to sequences that do not.

An alternative approach to the above is to immobilize the test sequenceand then bring it into contact with “free” labeled probe in which casethe probe will only hybridize (or will more stringently hybridize) withthose test sequences that form a “perfectly matched duplex” with it.

Another method for SNP detection uses immobilized oligonucleotideprimers, allele specific hybridization, and polymerase extension in thepresence of one or more dideoxynucleotide terminating nucleotides. Inthis method, the dideoxynucleotide terminator is labeled to detect thesequence polymorphic differences in test nucleotide sequence samples andis the subject of U.S. Pat. Nos. 5,610,287 and 6,030,782.

A still further approach to SNP detection using the solid supportutilizes methodology involves allele-specific amplification in whichprimers are designed to specifically amplify sequence variations withinthe test sequence samples. In this method detection of the method arisesby either immobilizing the amplified nucleotide fragments or an allelespecific oligonucleotide probe either of which are labeled for ease ofdetection.

The above methods suffer from difficulties in allele-specificamplification by PCR. These include (i) the inherent limitations of PCRwith regard to length of amplification product and background; (ii)primer extension as the result of a mismatched primer-template complexas the result of which the non-matching allele is amplified along withthe primer-matching allele; and, (iii) because different DNA sampleswill be heterozygous at different combinations of nucleotides, differentprimers and assay conditions must be established for each pair ofpolymorphic sites that are to be identified.

Typically, for a SNP assay, the test sequence containing the polymorphicsite (which can exist as either of two alleles “A₁” and “A₂”), isamplified from genomic DNA by PCR, producing a product that is a mixtureof fragments amplified from each member of the relevant chromosome pair.The PCR products may be labeled, e.g. by the incorporation ofradioactive or fluorescent tags during PCR. The SNP is identified bydenaturing the labeled PCR products and hybridizing the mixture to twooligonucleotides, A_(1p) and A_(2p), which are oligonucleotides probesspecific to the respective alleles.

Samples that are homozygous for allele “A₁” should hybridize morestrongly to oligonucleotide A_(p), producing a strong hybridizationsignal. Ideally, there should be little, or no, hybridization tooligonucleotide A_(2p) (because of the single base mis-match),presumably allowing the genotype at the SNP to be identifiedunambiguously. Similarly, samples from individuals homozygous for allele“A₂” should hybridize more strongly to oligonucleotide A_(2p) and not tooligonucleotide A_(1p). Samples from individuals heterozygous at thisSNP should hybridize equally to both

Table 2 shows the molecular weights of the four DNA nucleotidemonophosphates and the mass difference between each pair of nucleotides.

Table 3A shows the masses of all possible 2 mers, 3 mers, 4 mers and 5mers of the DNA nucleotides in Table 2.

Table 3B shows the eight possible sets of isobaric oligonucleotides.

Table 4 shows the masses of all possible 2 mers, 3 mers, 4 mers, 5 mers,6 mers and 7 mers that would be produced by cleavage at one of the fournucleotides and the mass differences between neighboringoligonucleotides.

Table 5 shows the mass changes that will occur for all possible pointmutations (replacement of one nucleotide by another) and the theoreticalmaximum size of a polynucleotide in which a point mutation should bedetectable by mass spectrometry using mass spectrometers of varyingresolving powers.

Table 6 shows the actual molecular weight differences observed inanoligonucleotide using the method of this invention; the differencereveals a hitherto unknown variance in the oligonucleotide.

Table 7 shows all of the masses obtained by cleavage of an exemplary 20mer in four separate reactions, each reaction being specific forcleavage at one of the DNA nucleotides; i.e., at A, C, G and T.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows one plus strand primer and two minus strand primers used toproduce 66 nucleotide (nt) PCR products from the human ReplicationFactor C(RFC) gene (38 kDA subunit). RFCbio+RFC was used to amplify theRFC sequence in GenBank, while RFCbio+RFCmut was used to amplify amutant sequence containing a C in place of a T, 4 nucleotides from the5′ end of RFCmut.

FIG. 2 shows the length and mass of the cleavage products anticipatedfrom incorporation of 7-methyl-dGTP into the extension product followedby cleavage with piperidine. Only one fragment is expected to change inmass; i.e., the 3′ terminal 10 mer.

FIG. 3 shows a 10% polyacrylamide gel analysis of the primer extensionproducts shown in FIG. 2 after full substitution of 7-methyl-dGTP fordGTP and cleavage with piperidine for one hour at 900 C. The cleavageproducts (lanes 2 and 4) correspond with those predicted in FIG. 2,albeit the two 10 mers overlap and cleavage is incomplete, possibly dueto partial cleavage at the three consecutive G residues adjacent to thevariant 10 mer. A 9 mer is seen in lane 4 (RFCmut) that is absent inlane 2 (RFC).

FIG. 4A shows the MALDI-TOF mass spectrum of the fragments of theextension product (T-variant) shown in FIG. 2 after full substitution of7-methyl-dGTP for dGTP and cleavage with piperidine. The insert is ablow-up of that region of the spectrogram containing the two 10 mers.

FIG. 4B shows the expected masses of the 8 mer and the two 10 mers fromFIG. 4A. Although mass accuracy is off by about 20 Da, the differencesare very close to the predicted values: 511.97 for the differencebetween the 8 mer and the invariant 10 mer compared to 512.32 predictedand 15.39 for the difference between the 10 mers compared to 15.02expected.

FIG. 5A shows the MALDI-TOF mass spectrum of the fragments of theextension product (C-variant) shown in FIG. 2 after full substitution of7-methyl-dGTP for dGTP and cleavage with piperidine. The primer mass,6575.79 appears to the right of the spectrum while the two 10 mers andthe 8 mer appear to theleft and in the insert.

FIG. 5B shows the expected masses of the 8 mer and the two 10 mers fromFIG. 5A. The mass difference between the invariant 10 mer and the 8 meris very close to the predicted value, 512.74 found, 512.32 predicted,while the mass difference between the two 10 mers is far from thepredicted value, 319.93 found, 30.04 predicted.

FIG. 6A shows the MALDI-TOF mass spectrum of the RFC and RFCmut primersused to produce the extension products shown in FIGS. 3 and 5.

FIG. 6B shows the expected masses and mass differences of the twoprimers as well as the expected mass difference of an RFCmut missing aG. The spectrum in 6A suggests that the latter was in fact thecase—apparently, the primer received from the commercial source wasmissing a G, which explains the indicated discrepancies in both FIG. 3and FIG. 5.

FIG. 7A shows that substitution of 5′-amino-dTTP for dTTP had no illeffect on primer extension (lane 1 is the natural extension product,lane 3 is the extension product with 5′-NH₂-dTTP for dTTP substitution).The effect of treatment with glacial acetic acid is shown in lanes 2(natural extension product, no effect), 4 (nucleotide substitution, 1hour treatment) and lane 5 (nucleotide substitution, 2 hour treatment).

FIG. 7B shows the chemical structure of 5′-amino-DT, dG, dC and dA.

FIG. 8A shows the result of primer extension of a 7.2 kb M13 template inthe presence of 5′-NH₂-dTTP and subsequent restriction with Msc I beforeheat denaturing of the extension product, which results in mostly the7.2 kb product.

FIG. 8B show the result of restriction with Msc I after heat denaturing,which gives a 1.2 kb product.

FIG. 8C shows a proposed mechanism that would afford the 1.2 kb productby restriction after heat denaturing.

FIG. 9 is an autoradiogram showing the result of cleavage of the 1.2 kbMsc I restriction product at the sites of incorporation of5′-amino-d-TTP with acetic acid and resolution of the fragments bydenaturing acrylamide gel electrophoresis.

FIG. 10 shows the result of ion pair reverse phase H PLC separation ofHac III DNA restriction fragments from PhiX174. The fragment lengths areshown above the peaks. Resolution was performed on a Micra ScientificNPS C18 1.5 μm column at 630 C using 0.1 M TEAA, pH 8.3 as buffer A and50% CH₃CN, 0.1 M TEAA, pH 8.3, as buffer B.

FIG. 11A is a comparison of Sanger-type sequencing with the modifiednucleotide incorporation/cleavage procedure of the present invention.Lane 1 is the TaqFS extension product with no dideoxynucleotides. Lane 2is the purified KIenow (exo-) extension product substituting 5′-NH₂-dTTPfor dTTP. Lanes 3, 4, 5, 6, 8, 9 10 and 11 are the Sanger fragmentladders. Lanes 7 and 12 are the ladders obtained by the chemicalcleavage method of this invention (T ladder only).

FIG. 11B is a schematic representation of the present invention usingmodified nucleotide incorporation/cleavage to sequence DNA compared tothe Sanger sequencing method. The asterisk (*) represents a dye orisotopic label. The 4^(th) sequence, which has no ddT at the endsignifies non-specific polymerization termination caused by secondarystructure and/or other phenomena that result in background noise. Thesame non-specific polymerization termination in the method of thepresent invention (the short extension product) does not contribute tobackground because subsequent acid cleavage removes the non-specificsequences, as shown.

FIG. 12 demonstrates the dinucleotide cleavage method of the presentinvention. As can be seen, cleavage only occurs when the ribo-C and thethio-A are adjacent to one another (column 1). If either of the modifiednucleotides is not positioned properly, very little (column 2) or no(columns 3 and 4) cleavage results.

FIG. 13 is a graph depicting the efficiency of variance detection as afunction of polynucleotide length. 1s1n refers to single strand, onemodified nucleotide, 1s2n to single strand, two different nucleotides inseparate reactions, 2s1n to two strands analyzed separately with thesame nucleotide in each, 2s2n to two strands, two different nucleotidesand Di to dinucleotide cleavage (all possible dinucleotidecombinations). As can be seen, even single strand, single nucleotidecleavage is up to 85% efficient at detecting all variances in a 250 merpolynucleotide.

FIGS. 14 through 18 show various aspects of long range DNA sequencingusing chemically cleavable modified nucleotides.

FIG. 14A illustrates a hypothetical human DNA sequence modeled afterdata reported in Martin-Gallardo, et al., Nature Genetics, 1992,1:34-39. The consensus length of the Alu repeat elements is 280nucleotides. The partial L1 element is approximately 850 nucleotideslong.

FIG. 14B illustrates the distribution of DNA sequences obtained byshotgun sequencing with 7-fold redundant coverage. Sequences arerepresented by horizontal black lines while repeat elements arerepresented by shading behind the sequences to illustrate the fact thatmany sequence reads start or end in Alu or L1 repeat sequences whichhinders definitive assignment of sequence overlaps.

FIG. 14C illustrates the same analysis using the method of thisinvention with full or partial substitution of a modified nucleotide fora natural nucleotide followed by cleavage and analysis of the fragments.The steps to achieve this result are depicted in FIGS. 15-18.

FIG. 15 illustrates the steps for sequencing a 2.7 kb double strandedDNA using the method of this invention and 5′-amino nucleosidetriphosphates as the modified nucleotides. Step A: linearize pUC19 withHinell (or can perform primer extension using a circular duplextemplate); denature duplex DNA. Step B: primer extend in presence offour dNTPs and one 5′-NH₂-dNTP at a ratio that produces partialsubstitution of the 5′-amino nucleotide for the natural nucleotide;purify extension product. Step C: digest with Dde I to give fragmentsshown. Step D: end-label recessed Ddel ends with rhodamine-dUTP (R 110)using polymerase fill-in ends (Klenow exo-polymerase. Step E:fractionate labeled digestion products; cleave with acid; analyzefragments using, for example capillary electrophoresis.

FIG. 16 shows the separation by HPLC of fragments from Dde I restrictionendonuclease digested, rhodamine dUTP end-labeled pUC19 DNA. Thefragments were resolved using an HP ZORBAX-Eclipse HPLC column at 45° C.and 0.1 M TEM, pH 7.0, 0.1 mM EDTA as buffer A and 25% CH₃CN, 75% bufferA as buffer B; gradient 60-34% A over 2 minutes, 34-20% A over 22minutes and 20-0% A over 1 minute.

FIG. 17 shows a comparison of long range sequencing using the method ofthis invention (5′-amino nucleoside triphosphate modified nucleotides)with dideoxy sequencing. The first row of panels shows the result of thedideoxy chain termination reactions, loss of signal by 1 kb. The secondrow of panels shows the results using partial substitution with amodified nucleotide followed by chemical cleavage, strong signal to 4kb. The third row of panels relates to molecular size markers from 100nt to 4,000 nt.

FIG. 18 is a comparison of sequencing ladders obtained by chaintermination (Sanger) sequencing (the ddA lane) compared to the method ofthis invention using 5′-amino-A with progressively greater amounts ofacid in the cleavage reaction.

FIG. 19A shows the results of digestion of a 700 nt DNA fragment withAlu I, the vertical marks indicating the sites of cleavage.

FIG. 19B shows the results of cleavage using the dinucleotide method ofthis invention with 12 possible dinucleotide pairs. Dinucleotidecleavage produces a median fragment size of 16 nucleotides

FIG. 20 illustrates the dinucleotide cleavage method of this inventionusing a ribonucleotide and a 5′-amino nucleotide in a 5′ to 3′orientation. The products of cleavage are shown.

FIG. 21 illustrates the method of this invention involving incorporationof two different modified nucleotides in the same DNA strand andcleavage by two different chemical means to produce two differentsequence ladders. The primer sequence is underlined. T nucleotides arenumbered above the sequence and G nucleotides are numbered below thesequence. In the ladder lane 1 is the extension product using ribo-GTP,lane 2 is the result of cleavage of the lane 1 product with chemicalbase, lane 3 is the extension product incorporation 5′-aminoTTP, lane 4is the result of cleavage of the lane 3 product with acid, lane 5 is theextension product containing both ribo-GTP and 5′-aminoTTP, lane 6 isthe result of cleavage of the lane 5 product with acid and lane 7 is theresult of cleavage of the lane 5 product with chemical base.

FIG. 22 illustrates dinucleotide cleavage at GT in one allele of thetransferrin receptor gene. Primer extension was carried out using rGTP,5′-aminoTTP, dCTP, dGTP and α-³²P-dCTP (for body-labeling if DNAfragments). A 1:4 mixture of Klenow (exo-) and E710A Klenow (exo-) wasused for extension. Lane 1 is the full length 87 nt fragment extendedwith natural dNPs, lane 2 is the result of dinucleotide cleavage of theextension product containing rGTP and 5′-aminoTTP and lane 3 aremolecular weight markers 12 nt to 32 nt.

FIG. 23 illustrates dinucleotide cleavage at AT in the serine allele ofthe transferrin receptor gene. The primer is lightly underlined. Theheavy underlining shows the expected fragments from AT dinucleotidecleavage. Lane 1 is the molecular size marker, lane 2 is the result ofdinucleotide cleavage at the sites of incorporation of modified Aadjacent to modified T.

FIG. 24 shows the MALDI-TOF mass spectra of the AT dinucleotide cleavageproducts from the 87 nt transferrin receptor fragment of FIG. 23. Allfragments are observed except for a 2 nt fragment.

FIG. 25 illustrates the primer extension of M13 mp18 DNA followed bydinucleotide cleavage at AT sites. The occurrence of AT dinucleotides isshown for the first 257 nucleotides, as are the expected products of ATcleavage. Lane 1 shows molecular size markers, lane 2 is the result ofdinucleotide cleavage. All expected fragments of 6 nucleotides andgreater are observed.

FIG. 26 shows the MALDI-TOF mass spectra of the fragments obtained fromAT dinucleotide cleavage of the 257 nt fragment of the M13 mp18 DNAshown in FIG. 25.

FIGS. 27-33 demonstrate the application of mononucleotide cleavage togenotyping by mass spectrometry, capillary electrophoresis and FRET.

FIG. 27 is a schematic representation of genotyping by chemicalrestriction. The template is amplified using one cleavable nucleotideanalog, dA*TP. The amplicons are chemically restricted to give fragmentswith the indicated length and mass differences. The fragments obtainedcan be analyzed by mass spec of electrophoresis.

FIGS. 28A-C show the steps in genotyping a polynucleotide by massspectrometry: (A) shows the PCR amplification of an 82 bp fragment oftransferrin receptor and indicates the site of polymorphism; (B)indicates the amplification in the presence of a modified nucleotide,dA*TP, the structure of which is shown; and, (C) is a gel comparingamplification with unmodified nucleotide and with modified nucleotideand shows that full substitution with modified nucleotide is compatiblewith efficient PCR amplification.

FIGS. 29A-B illustrate genotyping by detection by mass differencesobtained from the amplification and cleavage of the variant forms oftransferrin receptor. Only the fragments that illustrate the length andmass differences among the fragments of the same (invariant) anddifferent (variant) alleles are shown.

FIG. 30A is another illustration of genotyping by mass spectrometry. Thespectrum is a MALDI-TOF analysis of a chemically restricted DNAfragment. The boxed areas are regions that contain fragments withpolymorphism.

FIG. 30B is another illustration of genotyping by mass spectrometry,this time looking as length differences. The spectra constitute aMALDI-TOF comparison of chemically restricted primer fragments ofhomozygote and heterozygote samples. The figure shows the mass spectraof three genotypes in the region of 7000 Da to 9200 Da.

FIG. 31A illustrates genotyping by mass spectrometry wherein massdifferences are detected. The spectrum is the result of a MALDI-TOFanalysis of a heterozygote sample that has been chemically restricted inthe presence of tris(2-carboxymethyl)phosphine and piperidine.

FIG. 31B shows the proposed chemical structure of the cleavage productobtained under the conditions indicated in FIG. 31A.

FIG. 32A-B illustrates genotyping by chemical cleavage followed byelectrophoresis. In (A) the capillary electrophoresis analysis of achemically restricted polymorphic DNA fragment is depicted. In (B), thedenaturing 20% PAGE analysis of the chemically restricted amplicon isshown.

FIG. 33A-D illustrate genotyping by fluorescence resonance energytransfer (FRET): (A) amplify template using one modified, cleavablenucleotide (DA*TP). Primer 1 is modified with a fluor, F1; (B) aftercleavage a probe modified with a second fluor, F2, and complementary toprimer 1 is added; (C) at elevated temperature, the allele shortened bycleavage is not bound to the probe (and, therefore, no FRET is produced)while the uncleaved allele remains bound giving a FRET. (D) shows ameans for positive detection of the short fragment by modifying theprobe to contain a hairpin and an additional fluor, F3. The hairpin willopen only after binding with the longer, uncleaved fragment resulting ina difference in FRET production.

FIGS. 34A and B illustrate hybridization specific detection based onmelting point differences where the oligonucleotide capture probe andthe primer completely overlap.

FIG. 35 illustrates that the capture oligonucleotide probe may also onlypartially overlap the relevant sequence. In this case, the T allelefragment alone can be detected by using an annealing temperature abovethe melting temperature of the G allele fragment/capture probe duplex,which will denature.

FIG. 36 illustrates that the capture oligonucleotide probe forhybridization detection methods may be designed to hybridize to aninternal fragment, rather than the 5′ terminal fragment. Again,detection of the T allele alone can be accomplished by using anannealing temperature higher than the melting temperature of the Gallele/capture oligonucleotide duplex.

FIG. 37 illustrates incorporation of a modified, cleavable nucleotidedG^(m)TP and a labeled nucleotide, dA*TP. As shown, only the T allele isdetectable by the capture probe since the labeled adenine. (A*) survivesonly in that allele.

FIG. 38 illustrates incorporation of a modified, cleavable nucleotideand a labeled nucleotide similar to the method depicted in FIG. 36. Inthis method, a modified, cleavable T nucleotide (dT^(m)TP) isincorporated instead of a modified G as in FIG. 36. Here, only the Gallele is detectable since only the capture probe/G allele fragmentduplex retains the labeled A*.

FIG. 39 illustrates a mechanism by which a label can be incorporated inthe amplified fragment using secondary amines. In the figure, R¹, R² ora combination thereof would provide the detectable label.

FIG. 40 illustrates the application of fluorescence resonance energytransfer for detection of the alielic differences. Two primers are usedto amplify the region containing the polymorphic site. The amplifiedfragments are then subjected to chemical cleavage. A dye molecule (F1)is appended to the chemical cleavage fragments coincidentally withcleavage or subsequent to cleavage. The capture probe contains a seconddye, F2 (or two capture probes may be used, each of which includes thesecond dye or one of which contains a third dye, F3. Since the cleavagereaction will result in fragments of differing lengths, and thusdiffering proximities of the F1 label to the F2 or F3 label of the Aallele or G allele, respectively, a predictable and detectabledifference in the FRET will be observed.

FIG. 41A illustrates the application of the allele specific cherhicalcleavage using cleavable ribonucleotides. Since, in most ribonucleotidecleavage reactions, the base remains intact in the cleavage product, alabel can be attached to the base as shown in the inset. Cleavage andsubsequent capture of the fragments followed by fluorescence detectionresults in identification of the two alleles.

FIG. 41B shows exemplary, but in no way limiting, chemical structures oflabeled cleavable ribonucleotides.

FIG. 42 illustrates the application of an immobilized primer. Detectionof the fragments corresponding to the two alleles can be accomplishedusing FRET, as shown in FIG. 39, or simple fluorescence detection asshown in FIG. 40. To create a FRET, the immobilized fragment must behybridized with an oligonucleotide probe carrying the second dyemolecule.

FIG. 43 illustrates the intramolecular specific methods that are basedon incorporation of multiple labels during the PCR amplificationreaction. In the figure, N represents any nucleotide, the PCR primer isunderlined, G^(m) is a modified, cleavable G and A* and C* are labelednucleotides. The possible FRET detection results are shown in the box.

FIG. 44 illustrates three methods of single nucleotide polymorphism(SNP) detection using hair-pin loop formation in the chemical cleavagefragments to create a FRET. In each panel, the site of polymorphism isbolded, N represents any nucleotide, and the 3′ primer is not shown.

FIG. 44A illustrates the use of a single modified, cleavable nucleotide,DGMTP and a 5′ primer bearing a dye molecule (G*). The primer has asequence on the 5′ end that is complementary to the 3′ sequence of theamplicon region nearest to the site of polymorphism. The chemicalcleavage fragments incorporate a second dye on two different nucleotides(or a different dye on each nucleotide). The fragments are incubatedunder conditions selected for hair-pin loop formation as shown in Step3e. FRET detection possibilities are shown in the inset table where Acc,Donor, and Donor/Acc represent the acceptor, donor, and donor oracceptor molecule emission wavelengths.

FIG. 44B illustrates the use of the modified, cleavage nucleotide dAmTPinstead of dGmTP.

FIG. 44C illustrates the use of two different 5′ primers for PCRamplification. As shown, the short primer may be considered identical tothe 5′ primer shown in FIGS. 44A and 44B. The long primer extends the 5,end of the 5′ primer and the label again occurs on the 5′ end base, inthe figure T*.

FIG. 45 illustrates another SNP detection method using PCRamplification, chemical cleavage, hair-pin loop formation and FRETdetection. As shown, PCR amplification includes a modified cleavablenucleotide dG^(m)TP. N represents any polynucleotide. The 5′ PCR primeris designed to have a 5′ end base label (A*) and two regions to formduplexes; one region near to the 5′ end (A*AAA and TTTT) and one regiondownstream from the 5′ primer end, but 5′ from the site of polymorphism.As above, chemical cleavage and subsequent hair-pin loop formationresults in the products shown in Step 3. The inset shows the possibledetectable FRET signals.

FIG. 46 depicts the fluorescence of the incorporated label in Example 7.The samples that did not have 12-dUTP did not incorporate label (lanes 1and 6). In contrast the 12-dUTP was incorporated in the presence ofeither modified dATP (7-nitro-7-deaza-dATP, lanes 2-5) or modified dCTP(5-OH-dCTP, lanes 7-10).

FIG. 47 shows the efficacy of the PCR reaction when the reactionincludes modified nucleotides and fluorescence labeled dUTP. The agarosegel, after electrophoresis and visualization using a UV transluminator,was stained with ethidium bromide for visualization of the PCR amplifiedreaction products. The amplified DNA remained consistent in lanes 1-5wherein labeled dUTP and 7-nitro-7-deaza-dATP was included as well as inlanes 6-10 which included labeled dUTP and 5-OH-dCTP.

FIG. 48 shows the region of the P450 2D6 gene amplified in the PCRreaction as described in Example 7. The primers are underlined and thesites for modified base incorporation are indicated by an “m”; sites ofincorporation of labeled dUTP as indicated by an “*”. The labeledexpected 23 mer and the 34 mer are underlined.

FIG. 49 is the chromatogram of the ABI 377 PCR amplified and cleavedfragments. The expected labeled 23 mer and 34 mer fragments can easilybe identified.

DETAILED DESCRIPTION OF THE INVENTION

Definitions

As used herein, a “chemical method” refers to a combination of one ormore modified nucleotidesand one or more reagents which, when themodified nucleotide(s) is incorporated into a polynucleotide by partialor complete substitution for a natural nucleotide and the modifiedpolynucleotide is subjected to the reagent(s), results in the selectivecleavage of the modified polynucleotide at the point(s) of incorporationof the modified nucleotide(s).

By “analysis” is meant either detection of variance in the nucleotidesequence among two or more related polynucleotides or, in thealternative, the determination of the full nucleotide sequence of apolynucleotide.

By “reagent” is meant a chemical or physical force which causes thecleavage of a modified polynucleotide at the point of incorporation of amodified nucleotide in place of a natural nucleotide; such a reagent maybe, without limitation, oligonucleotides, since the PCR products shouldcontain equal amounts of DNA amplified from the two chromosomes carryingthe “A₁” and “A₂” alleles.

In practice, it is often difficult to design oligonucleotide probes thatcan reproducibly and robustly discriminate between different SNP allelesin PCR products, amplified from genomic DNA and other DNA samplesbecause the hybridization signal from the perfectly-matched duplex maynot differ sufficiently from that produced by a duplex carrying a singlemis-match. What is needed, then, is a simple, low cost, rapid androbust, yet sensitive and accurate, method for detecting polymorphisms,in particular single nucleotide polymorphisms, in a polynucleotide (DNAor RNA). The present invention provides such a method.

SUMMARY OF THE INVENTION

Thus, in one aspect, this invention relates to a method for detectingpolymorphism in a polynucleotide, comprising providing a polynucleotidesuspected of containing a polymorphism; amplifying a segment of thepolynucleotide encompassing the suspected polymorphism whereinamplification comprises replacing one or more natural nucleotide(s), oneof which is a nucleotide involved in the suspected polymorphism, atsubstantially each point of occurrence in the segment with a modifiednucleotide or, if more than one natural nucleotide is replaced, withdifferent modified nucleotides to form an amplified modified segment;cleaving the amplified modified segment into fragments by contacting itwith a reagent or reagents that cleave(s) the segment at substantiallyeach point of occurrence of the modified nucleotide(s); hybridizing thefragments to an oligonucleotide; and, analyzing the hybridized fragmentsfor an incorporated detectable label identifying the suspectedpolymorphism.

In another aspect this invention relates to a method for detectingpolymorphism in a polynucleotide, comprising amplifying a segment of thepolynucleotide encompassing the suspected polymorphism whereinamplification comprises replacing a natural nucleotide that is involvedin the suspected polymorphism at substantially each point of occurrencein the segment with a modified nucleotide to form an amplified modifiedsegment; cleaving the amplified modified segment into fragments bycontacting it with a reagent or reagents that cleave(s) the segment atsubstantially each point of occurrence of the modified nucleotide(s);hybridizing the fragments to an oligonucleotide which forms duplexeswith the fragments that have different melting temperatures; subjectingthe duplexes to a temperature that is above the melting temperature ofat least one duplex; and, analyzing the remaining duplexes for anincorporated label identifying the suspected polymorphism.

In another aspect of this invention, the detectable label isincorporated during amplification.

In a further aspect of this invention, incorporating the detectablelabel during amplification comprises using a detectably labeled primer.

In an aspect of this invention, the detectably labeled primer comprisesa radioactive primer or a primer containing a fluorophore.

In a still further aspect of this invention, incorporating thedetectable label during amplification comprises using a detectablylabeled, modified nucleotide.

In another aspect of this invention, the detectably labeled, modifiednucleotide comprises a radioactive modified nucleotide or a modifiednucleotide containing a fluorphore.

In an aspect of this invention, the detectably labeled, modifiednucleotide is a detectably labeled, modified ribonucleotide.

In an aspect of this invention, the detectably labeled, modifiedribonucleotide comprises a radioactive modified ribonucleotide or amodified ribonucleotide containing a fluorophore.

In still another aspect of this invention, incorporating the detectablelabel during amplification comprises replacing a natural nucleotide,that is different than the natural nucleotide(s) being replaced with amodified nucleotide(s), at one or more point(s) of occurrence in thesegment with a detectably labeled nucleotide.

In yet another aspect of this invention, the detectably labelednucleotide comprises a radioactive nucleotide or a nucleotide containinga fluorophore.

In further aspect of this invention, the detectably labeled nucleotidecomprises a detectably labeled ribonucleotide.

In another aspect of this invention, the detectably labeledribonucleotide comprises a radioactive ribonucleotide or aribonucleotide containing a fluorophore.

In an aspect of this invention, the detectable label is incorporatedduring cleavage.

In a further aspect of this invention, incorporating the detectablelabel during cleavage comprises using detectably labeledtris(carboxyethyl)phosphine (TCEP).

In a still further aspect of this invention, using detectably labeledTCEP comprises using radioactive TCEP or TCEP-containing a fluorophore.

In another aspect of this invention, incorporating the detectable labelduring cleavage comprises using a detectably labeled secondary amine.

In yet another aspect of this invention, using a detectably labeledsecondary amine comprises using a radioactive secondary amine or asecondary amine containing a fluorophore.

In an aspect of this invention, the detectable label is incorporatedduring hybridization.

In an aspect of this invention, incorporating the detectable labelduring hybridization comprises hybridizing a second, detectably labeledoligonucleotide to the fragments hybridized to the oligonucleotide.

In a further aspect of this invention, the second; detectably labeledoligonucleotide comprises a radioactive oligonucleotide or anoligonucleotide containing a fluorophore.

In another aspect of this invention, the detectable label isincorporated after cleavage or after hybridization, the methodcomprising cleaving using a reagent comprising TCEP or a secondaryamine; and, substituting the TCEP or secondary amine with a radioactivemolecule or a fluorophore after cleavage or after hybridization.

In a further aspect of this invention the polymorphism is selected fromthe group consisting of a single nucleotide polymorphism (SNP), adeletion or an insertion.

In a still further aspect of this invention, amplifying the segmentcomprises a polymerase chain reaction (PCR).

In an aspect of this invention, amplifying the segment comprisesreplacing one natural nrucleotide that is involved in the suspectedpolymorphism at each point of occurrence in the segment with a modifiednucleotide to form a modified segment.

In a further aspect of this invention, the above-modified nucleotidecomprises a labeled, modified nucleotide.

In another aspect of this invention, the above-labeled modifiednucleotide comprises a radioactive modified nucleotide or a modifiednucleotide containing a fluorophore.

In an aspect of this invention, the above-modified nucleotide comprisesa modified ribonucleotide.

In still another aspect of this invention, the modified nucleotidecomprises a labeled, modified ribonucleotide.

In an aspect of this invention, the labeled, modified ribonucleotidecomprises a radioactive ribonucleotide or a ribonucleotide containing afluorophore.

In another aspect of this invention, hybridizing the fragments to anoligonucleotide comprises using an oligonucleotide that is immobilizedon a solid support.

In an aspect of this invention, the incorporated detectable labelcomprises fluorescence resonance energy transfer (FRET).

An aspect of this invention is a compound having the chemical structure:

wherein R¹ is selected from the group consisting of:

A compound having the chemical structure:

wherein said “Base” is selected from the group consisting of cytosine,guanine, inosine and uracil is another aspect of this invention.

Another aspect of this invention is a compound having the chemicalstructure:

wherein said “Base” is selected from the group consisting of adenine,cytosine, guanine, inosine and uracil c.

A still further aspect of this invention is a compound having thechemical structure:

wherein said “Base” is selected from the group consisting of adenine,cytosine, guanine, inosine, thymine and uracil.

A polynucleotide comprising a dinucleotide sequence selected from thegroup consisting of:

wherein each “Base” is independently selected from the group consistingof adenine, cytosine, guanine and thymine; W is an electron withdrawinggroup; and, X is a leaving group is also an aspect of this invention.The electron withdrawing group is selected from the group consisting ofF, Cl, Br, I, NO₂, C≡N, —C(O)OH and OH in another aspect of thisinvention and, in a still further aspect, the leaving group is selectedfrom the group consisting of Cl, Br, I and OTs.

An aspect of this invention is a method for synthesizing apolynucleotide comprising mixing a compound having the chemicalstructure:

wherein R¹ is selected from the group consisting of:

with adenosine triphosphate, guanosine triphosphate, and thymidine oruridine phosphate in the presence of one or more polymerases is, too, anaspect of this invention.

A method for synthesizing a polynucleotide comprising mixing a compoundhaving the chemical structure:

wherein R¹ is selected from the group consisting of:

with adenosine triphosphate, cytidine triphosphate and guanosinetriphosphate in the presence of one or more polymerases is also anaspect of this invention.

A method for synthesizing a polynucleotide, comprising mixing a compoundhaving the chemical structure:

wherein R¹ is selected from the group consisting of:

with cytidine triphosphate, guanosine triphosphate, and thymidinetriphosphate in the presence of one or more polymerases is a furtheraspect of this invention.

It is an aspect of this invention is a method for synthesizing apolynucleotide, comprising mixing a compound having the chemicalstructure:

wherein R¹ is selected from the group consisting of:

with adenosine triphosphate, cytidine triphosphate and thymidinetriphosphate in the presence of one or more polymerases.

Another aspect of this invention is a method for synthesizing apolynucleotide, comprising mixing a compound selected from the groupconsisting of:

-   -   a compound having the chemical structure:        wherein said “Base” is selected from the group consisting of        cytosine, guanine, inosine and uracil;    -   a compound having the chemical structure:        wherein said “Base” is selected from the group consisting of        adenine, cytosine, guanine, inosine and uracil; and    -   a compound having the chemical structure:        wherein the “Base” is selected from the group consisting of        adenine, cytosine, guanine or inosine, and thymine or uracil,        with whichever three of the four nucleoside triphosphates,        adenosine triphosphate, cytidine triphosphate, guanosine        triphosphate and thymidine triphosphate, do not contain said        base (or its substitute), in the presence of one or more        polymerases.

Another aspect of this invention is a method for synthesizing apolynucleotide, comprising mixing one of the following pairs ofcompounds:

wherein:

-   Base₁ is selected from the group consisting of adenine, cytosine,    guanine or inosine, and thymine or uracil;-   Base₂ is selected from the group consisting of the remaining three    bases which are not Base₁;-   R³ is O⁻—P(═O)(O⁻)—O—P(═O)(O⁻)—O—P(═O)(O⁻)—O—; and,-   W is an electron-withdrawing group;-   X is leaving group;-   a second W or X shown in parentheses on the same carbon atom means    that a single W or X group can be in either position on the sugar or    both W or both X groups can be present at the same time;-   R is an alkyl group;-   with whichever two of the four nucleoside triphosphates, adenosine    triphosphate, cytidine triphosphate, guanosine triphosphate and    thymidine triphosphate, do not contain base-1 or base-2 (or their    substitutes), in the presence of one or more polymerases.

An aspect of this invention is a polymerase that is capable ofcatalyzing the incorporation of a modified nucleotide into apolynucleotide wherein said modified nucleotide does not contain riboseas its only modifying characteristic. The above polymerase of claim 1obtained by a process comprising DNA shuffling in another aspect of thisinvention.

The DNA shuffling including process can comprise the following steps:

-   -   a. selecting one or more known polymerase(s);    -   b. performing DNA shuffling;    -   c. transforming shuffled DNA into a host cell;    -   d. growing host cell colonies;    -   e. forming a lysate from said host cell colony;    -   f. adding a DNA template containing a detectable reporter        sequence, the modified nucleotide or nucleotides whose        incorporation into a polynucleotide is desired and the natural        nucleotides not being replaced by said modified nucleotide(s);        and,    -   g. examining the lysate for the presence of the detectable        reporter.

The DNA-shuffling including process can also comprise:

-   -   a. selecting a known polymerase or two or more known polymerases        having different structures or different catalyzing capabilities        or both;    -   b. performing DNA shuffling;    -   c. transforming said shuffled DNA into a host to form a library        of transformants in host cell colonies;    -   d. preparing first separate pools of said transformants by        plating said host cell colonies;    -   e. forming a lysate from each said first separate pool host cell        colonies;    -   f. removing all natural nucleotides from each said lysate;    -   g. combining each said lysate with:        -   a single-stranded DNA template comprising a sequence            corresponding to an RNA polymerase promoter followed by a            reporter sequence;        -   a single-stranded DNA primer complementary to one end of            said template;        -   the modified nucleotide or nucleotides whose incorporation            into said polynucleotide is desired;        -   each natural nucleotide not being replaced by said modified            nucleotide or nucleotides;    -   h. adding RNA polymerase to each said combined lysate;    -   i. examining each said combined lysate for the presence of said        reporter sequence;    -   j. creating second separate pools of transformants in host cell        colonies from each said first separate pool of host cell        colonies in which the presence of said reporter is detected;    -   k. forming a lysate from each said second separate pool of host        cell colonies;    -   l. repeating steps g, h, I, j, k and l to form separate pools of        transformants in host cell colonies until only one host cell        colony remains which contains said polymerase; and,    -   m. recloning said polymerase from said one host cell colony into        a protein expression vector.

A polymerase which is capable of catalyzing the incorporation of amodified nucleotide into a polynucleotide, obtained by a processcomprising cell senescence selection is another aspect of thisinvention.

The cell senescence selection process can comprise the following steps:

-   -   a. mutagenizing a known polymerase to form a library of mutant        polymerases;    -   b. cloning said library into a vector;    -   c. transforming said vector into host cells selected so as to be        susceptible to being killed by a selected chemical only when        said cell is actively growing;    -   d. adding a modified nucleotide;    -   e. growing said host cells;    -   f. treating said host cells with said selected chemical;    -   g. separating living cells from dead cells; and,    -   h. isolating said polymerase or polymerases from said living        cells.

The cell senescence selection process can also comprise steps including:

-   -   7. The polymerase claim 5, wherein said process comprises:    -   a. mutagenizing a known polymerase to form a library of mutant        polymerases;    -   b. cloning said library of mutant polymerases into a plasmid        vector;    -   c. transforming with said plasmid vector bacterial cells that,        when growing, are susceptible to an antibiotic,    -   d. selecting transfectants using said antibiotic;    -   e. introducing a modified nucleotide, as the corresponding        nucleoside triphosphate, into the bacterial cells;    -   f. growing the cells;    -   g. adding an antibiotic, which will kill bacterial cells that        are actively growing;    -   h. isolating said bacterial cells;    -   i. growing said bacterial cells in fresh medium containing no        antibiotic;    -   j. selecting live cells from growing colonies;    -   k. isolating said plasmid vector from said live cells;    -   l. isolating said polymerase; and,    -   m. assaying said polymerase.        Repeating steps c to k of the above process one or more        additional times before proceeding to step l is another aspect        of this invention.

That the polymerase obtained in the above methods be a heat stablepolymerase is another aspect of this invention.

A final aspect of this invention is a kit, comprising:

-   -   one or more modified nucleotides;    -   one or more polymerases capable of incorporating said one or        more modified nucleotides in a polynucleotide to form a modified        polynucleotide; and,    -   a reagent or reagents capable of cleaving said modified        polynucleotide at each point of occurrence of said one or more        modified nucleotides in said polynucleotide.

BRIEF DESCRIPTION OF THE TABLES

Table 1 illustrates several procedures presently in use for thedetection of variance in DNA.

a chemical or combination of chemicals, normal or coherent (laser)visible or UV light, heat, high energy ion bombardment and irradiation.In addition, a reagent may consist of a protein such as, withoutlimitation, a polymerase.

“Related” polynucleotides are polynucleotides obtained from geneticallysimilar sources such that the nucleotide sequence of the polynucleotideswould be expected to be exactly the same in the absence of a variance orthere would be expected to be a region of overlap that, in the absenceof a variance would be exactly the same, where the region of overlap isgreater than 35 nucleotides.

A “variance” is a difference in the nucleotide sequence among relatedpolynucleotides. The difference may be the deletion of one or morenucleotides from the sequence of one polynucleotide compared to thesequence of a related polynucleotide, the addition of one or morenucleotides or the substitution of one nucleotide for another. The terms“mutation,” “polymorphism” and “variance” are used interchangeablyherein. As used herein, the term “variance” in the singular is to beconstrued to include multiple variances; i.e., two or more nucleotideadditions, deletions and/or substitutions in the same polynucleotide. A“point mutation” refers to a single substitution of one nucleotide foranother.

As used herein, a “single nucleotide polymorphism” or “SNP” refers topolynucleotide that differs from another polynucleotide by a singlenucleotide exchange. For example, without limitation, exchanging one Afor one C, G or T in the entire sequence of polynucleotide constitutes aSNP. Of course, it is possible to have more than one SNP in a particularpolynucleotide. For example, at one locus in a polynucleotide, a C maybe exchanged for a T, at another locus a G may be exchanged for an A andso on. When referring to SNPs, the polynucleotide is most often DNA andthe SNP is one that usually results in a deleterious change in thegenotype of the organism in which the SNP occurs.

By “being suspected of containing a polymorphism” is meant that thepolynucleotide, usually DNA or RNA, being subjected to the method ofthis invention is one of known sequence, that sequence being known to becapable of containing a particular polymorphism at a known locus in thesequence.

By “amplifying a segment” as used herein, is meant the production ofsufficient multiple copies of the segment to permit relatively facilemanipulation of the segment. Manipulation refers to both physical andchemical manipulation, that is, the ability to move bulk quantities ofthe segment around and to conduct chemical reactions with the segmentthat result in detectable products.

A “segment” of a polynucleotide refers to an oligonucleotide that is apartial sequence of entire nucleotide sequence of the polynucleotide. A“modified segment” refers to a segment in which one or more naturalnucleotides have been replaced with one or more modified nucleotides. A“modified, labeled segment refers to a modified segment that alsocontains a nucleotide, which is different from the modified nucleotideor nucleotides therein, and which is detectably labeled.

“Encompassing the suspected polymorphism” means that the nucleotide ornucleotides that vary in the polynucleotide are included in the sequenceof the selected segment of the polynucleotide.

By “homozygous” is meant that the two alleles of a diploid cell ororganism at a given locus are identical, that is, that they have thesame nucleotide for nucleotide exchange at the same place in theirsequences.

By “heterozygous” or “heterozygous polymorphism” is meant that the twoalleles of a diploid cell or organism at a given locus are different,that is, that they have a different nucleotide exchanged for the samenucleotide at the same place in their sequences.

By “hybridization” or “hybridizing,” as used herein, is meant theformation of A-T and C-G base pairs between the nucleotide sequence of afragment of a segment of a polynucleotide and a complementary nucleotidesequence of an oligonucleotide. By complementary is meant that at thelocus of each A, C, G or T (or U in a ribonucleotide) in the fragmentsequence, the oligonucleotide sequenced has a T, G, C or A,respectively. The hybridized fragment/oligonucleotide is called a“duplex.”

By “immobilized on a solid support” is meant that a fragment, primer oroligonucleotide is attached to a substance at a particular location insuch a manner that the system containing the immobilized fragment,primer or oligonucleotide may be subjected to washing or other physicalor chemical manipulation without being dislodged from that location.Examples, without limitation, of solid supports are polymeric beads in avessel, the walls of a chromatography column, a filter paper and thelike. A number of solid supports and means of immobilizingnucleotide-containing molecules to them are known in the art; any ofthese supports and means may be used in the methods of this invention.As used herein, immobilization is used to separate fragments resultingfrom the cleavage of a segment containing a polymorphism from thosefragments not associated with a polymorphism. Fragments resulting fromthe cleavage of a segment containing a polymorphism refers to specificfragments that would not otherwise be formed if the polymorphism werenot present in the segment. This is demonstrated in the FIGS. 33 to 44where it can be seen that, absent the indicated polymorphism, thefragments shown would not be obtained.

By “melting temperature” is meant the temperature at which hybridizedduplexes dehybridize and return to their single-stranded state.Likewise, hybridization will not occur in the first place between twooligonucleotides, or, herein, an oligonucleotide and a fragment, attemperatures above the melting temperature of the resulting duplex. Itis presently preferred that the difference in melting point temperaturesof oligonucleotide-fragment duplexes of this invention be from about 1°C. to about 10° C. so as to be readily detectable.

By “detectably labeled” is meant that a fragment or an oligonucleotidecontains a nucleotide that is radioactive, that is substituted with afluorophore or some other molecular species that elicits a physical orchemical response can be observed by the naked eye or by means ofinstrumentation such as, without limitation, scintillation counters,calorimeters, UV spectrophotometers and the like.

By “analyzing” the hybridized fragments for an incorporated detectablelabel identifying the suspected polymorphism is meant that, at somestage of the sequence of events that leads to hybridized fragments, alabel is incorporated. The label may be incorporated at virtually anystage of the sequence of events including the amplification, thecleavage or the hybridization procedures. The label may even beintroduced into the sequence of events after cleavage but beforehybridization or even after hybridization. The label so incorporated isthen observed visually or by instrumental means. The presence of thelabel identifies the polymorphism due to the fact that the fragmentsobtained during cleavage are specific to the modified nucleotide(s) usedin the amplification and at least one of the modified nucleotide isselected so as to replace a nucleotide involved in the polymorphism.

A “sequence” or “nucleotide sequence” refers to the order of nucledtideresidues in a nucleic acid.

As noted above, one aspect of the chemical method of the presentinvention consists of modified nucleotides that can be incorporated intoa polynucleotidein place of natural nucleotides.

A “nucleoside” refers to a base linked to a sugar. The base may beadenine (A), guanine (G) (or its substitute, inosine (I)), cytosine (C),or thymine (T) (or its substitute, uracil (U)). The sugar may be ribose(the sugar of a natural nucleotide in RNA) or 2-deoxyribose (the sugarof a natural nucleotide in DNA).

A “nucleoside triphosphate” refers to a nucleoside linked to atriphosphate group (O⁻—P(═O)(O⁻)—O—P(═O)(O⁻)—O—P(═O)(O—)—O-nucleoside).The triphosphate group has four formal negative charges that requirecounter-ions, i.e., positively charged ions. Any positively charged ioncan be used, e.g., without limitation, Na⁺, K⁺, NH₄ ⁺, Mg²⁺, Ca²⁺, etc.Mg²⁺ is one of the most commonly used counter-ions. It is acceptedconvention in the art to omit the counter-ion, which is understood to bepresent, when displaying nucleoside triphosphates and that conventionwill be followed in this application.

As used herein, unless expressly noted otherwise, the term “nucleosidetriphosphate” or reference to any specific nucleoside triphosphate;e.g., adenosine triphosphate, guanosine triphosphate or cytidinetriphosphate, refers to the triphosphate made using either aribonucleoside or a 2′-deoxyribonucleoside.

A “nucleotide” refers to a nucleoside linked to a single phosphategroup.

A “natural nucleotide” refers to an A, C, G or U nucleotide whenreferring to RNA and to dA, dC, dG (the “d” referring to the fact thatthe sugar is a deoxyribose) and dT when referring to DNA. A naturalnucleotide also refers to a nucleotide which may have a differentstructure from the above, but which is naturally incorporated into apolynucleotide sequence by the organism which is the source of thepolynucleotide.

As used herein, inosine (I) refers to a purine ribonucleoside containingthe base hypoxanthine.

As used herein, a “substitute” for a nucleoside triphosphate refers to amolecule in a different nucleoside may be naturally substituted for A,C, G or T. Thus, inosine is a natural substitute for guanosine anduridine is a natural substitute for thymidine.

As used herein, a “modified nucleotide” is characterized by twocriteria. First, a modified nucleotide is a “non-natural” nucleotide. Inone aspect, a “non-natural” nucleotide may be a natural nucleotide thatis placed in non-natural surroundings. For example, in a polynucleotidethat is naturally composed of deoxyribonucleotides, a ribonucleotidewould constitute a “non-natural” nucleotide when incorporated into thatpolynucleotide. Conversely, in a polynucleotide that is naturallycomposed of ribonucleotides, a deoxyribonucleotide incorporated intothat polynucleotide would constitute a non-natural nucleotide. Inaddition, a “non-natural” nucleotide may be a natural nucleotide thathas been chemically altered, for example, without limitation, by theaddition of one or more chemical substituent groups to the nucleotidemolecule, the deletion of one or more chemical substituents groups fromthe molecule or the replacement of one or more atoms or chemicalsubstituents in the nucleotide for other atoms or chemical substituents.Finally, a “modified” nucleotide may be a molecule that resembles anatural nucleotide little, if at all, but is nevertheless capable ofbeing incorporated by a polymerase into a polynucleotide in place of anatural nucleotide.

The second criterion by which a “modified” nucleotide, as the term isused herein, is characterized is that it alters the cleavage propertiesof the polynucleotide into which it is incorporated. For example,without limitation, incorporation of a ribonucleotide into apolynucleotide composed predominantly of deoxyribonucleotides imparts asusceptibility to alkaline cleavage, which does not exist in naturaldeoxyribonucleotides. This second criterion of a “modified” nucleotidemay be met by a single non-natural nucleotide substituted for a singlenatural nucleotide (e.g., the substitution of ribonucleotide fordeoxyribonucleotide described above) or by a combination of two or morenon-natural nucleotides which, when subjected to selected reactionconditions, do not individually alter the cleavage properties of apolynucleotide but, rather, interact with one another to impose alteredcleavage properties on the polynucleotide (termed “dinucleotidecleavage”).

When reference is made herein to the incorporation of a single modifiednucleotide into a polynucleotide and the subsequent cleavage of thepolynucleotide, the modified nucleotide cannot be a ribonucleotide inwhich the use of ribose as the sugar moiety is the only modifyingcharacteristic of the modified nucleotide.

As used herein, a “modifying characteristic” as it relates to a modifiednucleotide refers to the changes made to the chemical structure of anatural nucleotide to render it “modified.” As used herein, thecharacteristic may refer to a general change, i.e., base modification,sugar modification or phosphate linkage modification, or it may refer toa specific change, e.g., substituting 7-deaza-7-nitroadenine for adenineor making a 2′-fluoro derivative of the sugar moiety of a particularnucleotide.

“Having different cleavage characteristics” when referring to a modifiednucleotide means that modified nucleotides incorporated into the samemodified polynucleotide can be cleaved under reaction conditions whichleaves the sites of incorporation of each of the other modifiednucleotides in that modified polynucleotide intact.

As used herein, a “label” or “tag” refers to a molecule that, whenappended by, for example, without limitation, covalent bonding orhybridization, to another molecule, for example, also withoutlimitation, a polynucleotide or polynucleotide fragment, provides orenhances a means of detecting the other molecule. A fluorescence orfluorescent label or tag emits detectable light at a particularwavelength when excited at a different wavelength. A radiolabel orradioactive tag emits radioactive particles detectable with aninstrument such as, without limitation, a scintillation counter.

A molecule that absorbs light at one wavelength and then emitsdetectable light at a second wavelength comprises a fluorescent label asdefined above and is referred to herein as a “fluorophore.”

A “mass-modified” nucleotide is a nucleotide in which an atom orchemical substituents has been added, deleted or substituted but suchaddition, deletion or substitution does not create modified nucleotideproperties, as defined herein, in the nucleotide; i.e., the only effectof the addition, deletion or substitution is to modify the mass of thenucleotide.

A “polynucleotide” refers to a linear chain of nucleotides connected bya phosphodiester linkage between the 3′-hydroxyl group of one nucleosideand the 5′-hydroxyl group of a second nucleoside which in turn is linkedthrough its 3′-hydroxyl group to the 5′-hydroxyl group of a thirdnucleoside and so on to form a polymer comprised of nucleosides liked bya phosphodiester backbone.

A “modified polynucleotide” refers to a polynucleotide in which one ormore natural nucleotides have been partially or substantially completelyreplaced with modified nucleotides.

A “modified DNA fragment” refers to a DNA fragment synthesized underSanger dideoxy termination conditions with one of the naturalnucleotides other than the one whichis partially substituted with itsdideoxy analog being replaced with a modified nucleotide as definedherein. The result is a set of Sanger fragments; i.e., a set offragments ending in ddA, ddC, ddG or ddT, depending on the dideoxynucleotide used with each such fragment also containing modifiednucleotides (if, of course, the natural nucleotide corresponding to themodified nucleotide exists in that particular Sanger fragment).

As used herein, to “alter the cleavage properties” of a polynucleotidemeans to render the polynucleotide differentially cleavable ornon-cleavable; i.e., resistant to cleavage, at the point ofincorporation of the modified nucleotide relative to sites consisting ofother non-natural or natural nucleotides. It is presently preferred to“alter the cleavage properties” by rendering the polynucleotide moresusceptible to cleavage at the sites of incorporation of modifiednucleotides than at any other sites in the molecule.

As used herein, the use of the singular when referring to nucleotidesubstitution is to be construed as including substitution at each pointof occurrence of the natural nucleotide unless expressly noted to beotherwise.

As used herein, a “template” refers to a target polynucleotide strand,for example, without limitation, an unmodified naturally-occurring DNAstrand, which a polymerase uses as a means of recognizing whichnucleotide it should next incorporate into a growing strand topolymerize the complement of the naturally-occurring strand. Such DNAstrand may be single-stranded or it may be part of a double-stranded DNAtemplate. In applications of the present invention requiring repeatedcycles of polymerization, e.g., the polymerase chain reaction (PCR), thetemplate strand itself may become modified by incorporation of modifiednucleotides, yet still serve as a template for a polymerase tosynthesize additional polynucleotides.

A “primer” is a short oligonucleotide, the sequence of which iscomplementary to a segment of the template which is being replicated,and which the polymerase uses as the starting point for the replicationprocess. By “complementary” is meant that the nucleotide sequence of aprimer is such that the primer can form a stable hydrogen bond complexwith the template; i.e., the primer can hybridize to the template byvirtue of the formation of base-pairs over a length of at least tenconsecutive base pairs.

As used herein, a “polymerase” refers, without limitation, to moleculessuch as DNA or RNA polymerases, reverse transcriptases, mutant DNA orRNA polymerases mutagenized by nucleotide addition, nucleotide deletion,one or more point mutations or the technique known to those skilled inthe art as “DNA shuffling” (q.v., infra) or by joining portions ofdifferent polymerases to make chimeric polymerases. Combinations ofthese mutagenizing techniques may also be used. A polymerase catalyzesthe polymerization of nucleotides to form polynucleotides. Methods aredisclosed herein and are aspects of this invention, for producing,identifying and using polymerases capable of efficiently incorporatingmodified nucleotides along with natural nucleotides into apolynucleotide. Polymerases may be used either to extend a primer onceor repetitively or to amplify a polynucleotide by repetitive priming oftwo complementary strands using two primers. Methods of amplificationinclude, without limitation, polymerase chain reaction (PCR), NASBR,SDA, 3SR, TSA and rolling circle replication. It is understood that, inany method for producing a polynucleotide containing given modifiednucleotides, one or several polymerases or amplification methods may beused.

The selection of optimal polymerization conditions depends on theapplication. In general, a form of primer extension may be best suitedto sequencing or variance detection methods that rely on dinucleotidecleavage and mass spectrometric analysis while either primer extensionor amplification (e.g., PCR) will be suitable for sequencing methodsthat rely on electrophoretic analysis. Genotyping methods are bestsuited to production of polynucleotides by amplification. Either type ofpolymerization may be suitable for variance detection methods of thisinvention.

A “restriction enzyme” refers to an endonuclease (an enzyme that cleavesphosphodiester bonds within a polynucleotide chain) that cleaves DNA inresponse to a recognition site on the DNA. The recognition site(restriction site) consists of a specific sequence of nucleotidestypically about 4-8 nucleotides long.

As used herein, “electrophoresis” refers to that technique known in theart as gel electrophoresis; e.g., slab gel electrophoresis, capillaryelectrophoresis and automated versions of these, such as the use of anautomated DNA sequencer or a simultaneous multi-channel automatedcapillary DNA sequencer or electrophoresis in an etched channel such asthat which can be produced in glass or other materials.

“Mass spectrometry” refers to a technique for mass analysis known in theart which includes, but is not limited to, matrix assisted laserdesorption ionization (MALDI) and electrospray ionization (ESI) massspectrometry optionally employing, without limitation, time-of-flight,quadrupole or Fourier transform detection techniques. While the use ofmass spectrometry constitutes a preferred embodiment of this invention,it will be apparent that other instrumental techniques are, or maybecome, available for the determination of the mass or the comparison ofmasses of oligonucleotides. An aspect of the present invention is thedetermination and comparison of masses and any such instrumentalprocedure capable of such determination and comparison is deemed to bewithin the scope and spirit of this invention.

As used herein, “FRET” refers to fluorescence resonance energy transfer,a distance dependent interaction between the electronic excited statesof two dye molecules in which excitation is transferred from one dye(the donor) to another dye (the acceptor) without emission of a photon.A series of fluorogenic procedures have been developed to exploit FRET.In the present invention, the two dye molecules are generally located onopposite sides of a cleavable modified nucleotide such that cleavagewith or without secondary structure formation will alter the proximityof the dyes to one another and thereby change the fluorescence output ofthe dyes on the resultant polynucleotide fragment products.

FRET can result in detectable quenching, differential light emission ordepolarization. When the donor and acceptor are different species,quenching occurs when the donor absorbs light at its excitationwavelength and then, instead of emitting light at its emissionwavelength, transfers some or all of its energy to the acceptor, whichis itself not a fluorescing species. The normal emission of the donor isthus reduced or eliminated (quenched). On the other hand, if theacceptor is a fluorescing species, it may itself emit light at itscharacteristic emission wavelength, which is selected so as to bedifferent from the emission wavelength of the donor. In this manner,quantitative differences in the emissions of the donor and acceptor canbe detected and used to deduce information about the molecules to whichthey are attached.

If the same dye molecule is used as both the donor and acceptor,fluorescence depolarization can be used to detect changes in themolecules to which the dye is attached. Fluorescent depolarizationoccurs when the donor is excited with plane polarized light. If noenergy is transferred to the second dye molecule, the light emitted bythe donor will remain polarized. If, on the other hand, energy istransferred and it is the second dye molecule that emits light, thatemitted light will be depolarized.

As used herein “construct a gene sequence” refers to the process ofinferring partial or complete information about the DNA sequence of asubject polynucleotide by analysis of the masses of its fragmentsobtained by a cleavage procedure. The process of constructing a genesequence generally entails comparison of a set of experimentallydetermined cleavage masses with the known or predicted masses of allpossible polynucleotides that could be obtained from the subjectpolynucleotide given only the constraints of the modified nucleotide(s)incorporated in the polynucleotide and the chemical reactionmechanism(s) utilized, both of which impact the range of possibleconstituent masses. Various analytical deductions may then be employedto extract the greatest amount of sequence information from the massesof the cleavage fragments. More sequence information can generally beinferred when the subject polynucleotide is modified and cleaved, inseparate reactions, by two or more modified nucleotides or sets ofmodified nucleotides because the range of deductions that may be madefrom analysis of several sets of cleavage fragments is greater.

As used herein, a “sequence ladder” is a collection of overlappingpolynucleotides, prepared from a single DNA or RNA template, which sharea common end, usually the 5′ end, but which differ in length becausethey terminate at different sites at the opposite end. The sites oftermination coincide with the sites of occurrence of one of the fournucleotides, A, G, C or T/U, in the template. Thus the lengths of thepolynucleotides collectively specify the intervals at which one of thefour nucleotides occurs in the template DNA fragment. A set of four suchsequence ladders, one specific for each of the four nucleotides,specifies the intervals at which all four nucleotides occur, andtherefore provides the complete sequence of the template DNA fragment.As used herein, the term “sequence ladder” also refers to the set offour sequence ladders required to determine a complete DNA sequence. Theprocess of obtaining the four sequence ladders to determine a completeDNA sequence is referred to as “generating a sequence ladder.”

As used herein, “cell senescence selection” refers to a process by whichcells that are susceptible to being killed by a particular chemical onlywhen the cells are actively growing; e.g., without limitation, bacteriawhich can be killed by antibiotics only when they are growing, are usedto find a polymerase which will incorporate a modified nucleotide into apolynucleotide. The procedure requires that, when a particularpolymerase which has been introduced into the cell line incorporates amodified nucleotide, that incorporation produces changes in the cellswhich cause them to senesce, i.e., to stop growing. When cell colonies,some members of which contain the modified nucleotide-incorporatingpolymerase and some member of which don't, are then exposed to thechemical, only those cells which do not contain the polymerase arekilled. The cells are then placed in a medium where cell growth isreinitiated; i.e., a medium without the chemical or the modifiednucleotide, and those cells that grow are separated and the polymeraseisolated from them.

As used herein, a “chemical oxidant” refers to a reagent capable ofincreasing the oxidation state of a group on a molecule. For instance,without limitation, a hydroxyl group (—OH) can be oxidized to a ketogroup. For example and without limitation, potassium permanganate,t-butyl hypochlorite, m-chloroperbenzoic acid, hydrogen peroxide, sodiumhypochlorite, ozone, peracetic acid, potassium persulfate, and sodiumhypobromite are chemical oxidants.

As used herein, a “chemical base” refers to a chemical which, in aqueousmedium, has a pK greater than 7.0. Examples of chemical bases are,without limitation, alkali (sodium, potassium, lithium) and alkalineearth (calcium, magnesium, barium) hydroxides, sodium carbonate, sodiumbicarbonate, trisodium phosphate, ammonium hydroxide andnitrogen-containing organic compounds such as pyridine, aniline,quinoline, morpholine, piperidine and pyrrole. These may be used asaqueous solutions that may be mild (usually due to dilution) or strong(concentrated solutions). A chemical base also refers to a strongnon-aqueous organic base; examples of such bases include, withoutlimitation, sodium methoxide, sodium ethoxide and potassium t-butoxide.

As used herein, the term “acid” refers to a substance that dissociateson solution in water to produce one or more hydrogen ions. The acid maybe inorganic or organic. The acid may be strong which generally infershighly concentrated, or mild, which generally infers dilute. It is, ofcourse, understood that acids inherently have different strengths; e.g.,sulfuric acid is much stronger than acetic acid and this factor may alsobe taken into consideration when selecting the appropriate acid to usein conjunction with the methods described herein. The proper choice ofacid will be apparent to those skilled in the art from the disclosuresherein. Preferably, the acids used in the methods of this invention aremild. Examples of inorganic acids are, without limitation, hydrochloricacid, sulfuric acid, phosphoric acid, nitric acid and boric acid.Examples, without limitation, of organic acids are formic acid, aceticacid, benzoic acid, p-toluenesulfonic acid, trifluoracetic acid,naphthoic acid, uric acid and phenol.

An “electron-withdrawing group” refers to a chemical group that, byvirtue of its greater electronegativity inductively draws electrondensity away from nearby groups and toward itself, leaving the lesselectronegative group with a partial positive charge. This partialpositive charge, in turn, can stabilize a negative charge on an adjacentgroup thus facilitating any reaction that involves a negative charge,either formal or in a transition state, on the adjacent group. Examplesof electron-withdrawing groups include, without limitation, cyano (C≡N),azido (—N≡N), nitro (NO₂), halo (F, Cl, Br, I), hydroxy (—OH),thiohydroxy (—SH) and ammonium (—NH₃ ⁺).

An “electron withdrawing element,” as used herein, refers to an atomwhich is more electronegative that carbon so that, when placed in aring, the atom draws electrons to it which, as with anelectron-withdrawing group, results in nearby atoms being left with apartial positive charge. This renders the nearby atoms susceptible tonucleophilic attack. It also tends to stabilize, and therefore favor theformation of, negative charges on other atoms attached to the positivelycharged atom.

An “electrophile” or “electrophilic group” refers to a group which, whenit reacts with a molecule, takes a pair of electrons from the molecule.Examples of some common electrophiles are, without limitation, iodineand aromatic nitrogen cations.

An “alkyl” group as used herein refers to a 1 to 20 carbon atom straightor branched, unsubstituted group. Preferably the group consists of a 1to 10 carbon atom chain; most preferably, it is a 1 to 4 carbon atomchain. As used herein “1 to 20,” etc. carbon atoms means 1 or 2 or 3 or4, etc. up to 20 carbon atoms in the chain.

A “mercapto” group refers to an —SH group.

An “alkylating agent” refers to a molecule that is capable ofintroducing an alkyl group into a molecule. Examples, withoutlimitation, of alkyl groups include methyl iodide, dimethyl sulfate,diethyl sulfate, ethyl bromide and butyl iodide.

As used herein, the terms “selective,” “selectively,” “substantially,”“essentially,” “uniformly” and the like, mean that the indicated eventoccurs to a particular degree. In particular, the percent incorporationof a modified nucleotide is greater than 90%, preferably greater than95%, most preferably, greater than 99% or the selectivity for cleavageat a modified nucleotide is greater than 10×, preferably greater than25×, most preferably greater than 100× that of other nucleotides naturalor modified, or the percent cleavage at a modified nucleotide is greaterthan 90%, preferably greater than 95%, most preferably greater than 99%.

As use herein, “diagnosis refers to determining the nature of a diseaseor disorder. The methods of this invention may be used in any form ofdiagnosis including, without limitation, clinical diagnosis (a diagnosismade from a study of the signs and symptoms of a disease or disorder,where such sign or symptom is the presence of a variance), differentialdiagnosis (the determination of which of two or more diseases withsimilar symptoms is the one from which a patient is suffering), etc.

By “prognosis,” as used herein, is meant a forecast of the probablecourseand/or outcome of a disease. In the context of this invention, themethods described herein may be used to follow the effect of a geneticvariance or variances on disease progression or treatment response. Itis to be noted that, using the methods of this invention as aprogriostic tool does not require knowledge of the biological impact ofa variance. The detection of a variance in an individual afflicted witha particular disorder or the statistical association of the variancewith the disorder is sufficient. The progression or response totreatment of patients with a particular variance can then be tracedthroughout the course of the disorder to guide therapy or other disordermanagement decisions.

By “having a genetic component” is meant that a particular disease,disorder or response to treatment is known or suspected to be related toa variance or variances in the genetic code of an individual afflictedwith the disease or disorder.

As used herein, an “individual” refers to any higher life form includingreptiles and mammals, in particular human beings. However, the methodsof this invention are useful for the analysis of the nucleic acids ofany biological organism

Discussion

In one aspect, this invention relates to a method for detecting avariance in the nucleotide sequence among related polynucleotides byreplacing a natural nucleotide in a polynucleotide at substantially eachpoint of incorporation of the natural nucleotide with a modifiednucleotide, cleaving the modified polynucleotide at substantially eachpoint of incorporationof the modified nucleotide, determining the massof the fragments obtained and then comparing the masses with thoseexpected from a related polynucleotide of known sequence or, if thesequence of a related polynucleotide is unknown, by repeating the abovesteps with a second related polynucleotide and then comparing the massesof the fragments obtained from the two related polynucleotides. Ofcourse, it is understood that the methods of this invention are notlimited to any particular number of related polynucleotides; as many asare needed or desired may be used.

In another aspect, this invention relates to a method for detecting avariance in the nucleotide sequence among related polynucleotides byreplacing two natural nucleotides in a polynucleotide with two modifiednucleotides, the modified nucleotides being selected so that, under thechosen reaction condition, they individually not impart selectivecleavage properties on the modified polynucleotide. Rather, when the twomodified nucleotides are contiguous; i.e., the natural nucleotides beingreplaced were contiguous in the unmodified polynucleotide, they act inconcert to impart selective cleavage properties on the modifiedpolynucleotide. In addition to mere proximity, it may also be necessary,depending on the modified nucleotides and reaction conditions selected,that the modified nucleotides are in the proper spatial relationship.For example, without limitation, 5′A-3′G might be susceptible tocleavage while 5′G-3′A might not. As above, once substitution of themodified nucleotides for the natural nucleotides has been accomplished,the modified nucleotide pair is cleaved, the masses of the fragments aredetermined and the masses are compared, either to the masses expectedfrom a related polynucleotide of known sequence or, if the sequence ofat least one of the related polynucleotides is not known, to the massesobtained when the procedure is repeated with other relatedpolynucleotides.

In another aspect, this invention relates to methods for detecting mono-or dinucleotide cleavage products by electrophoresis or fluorescenceresonance energy transfer (FRET), in which the detection event is theappearance or disappearance of fluorescence. Both these methods areparticularly well suited for detecting variance at a single site in apolynucleotide where the variance has been previously identified.Knowledge of the particular variance permits the, design ofelectrophoretic or FRET reagents and procedures specifically suited tothe rapid, low cost, automatable determination of the status of thevariant nucleotide(s). Examples of electrophoretic and FRET detection ofcleavage products are described below and in the Figures.

The use of the variance detection methods of this invention for thedevelopment of and use as diagnostic or prognostic tools for thedetection of predisposition to certain diseases and disorders is anotheraspect of this invention.

In the development of diagnostic tools, the methods of this inventionwould be employed to compare the DNA of a test subject which isdisplaying symptoms of a particular disease or disorder known orsuspected to be genetically-related or is displaying a desirablecharacteristic such as a health enhancing or economically valuable traitsuch as growth rate, pest resistance, crop yield, etc. with the DNA ofhealthy members of the same population and/or members of the populationwhich exhibit the same disease, disorder or trait. The test subject maybe, without limitation, a human, any other mammal such as rat, mouse,dog, cat, horse, cow, pig, sheep, goat, etc., cold-blooded species suchas fish or agriculturally important crops such as wheat, corn, cottonand soy beans. The detection of a statistically significant variancebetween the healthy members of the population and members of thepopulation with the disease or disorder would serve as substantialevidence of the utility of the test for identifying subjects having orat risk of having the disease or disorder. This could lead to veryuseful diagnostic tests.

Using the methods of this invention as a diagnostic or prognostic tool,it is entirely unnecessary to know anything about the variance beingsought; i.e., its exact location, whether it is an addition, deletion orsubstitution or what nucleotide(s) have been added, deleted orsubstituted. The mere detection of the presence of the varianceaccomplishes the desired task, to diagnose or predict the incidence of adisease or disorder in a test subject. In most instances, however, itwould be preferable to be able to create a specific genotyping test fora particular variance with diagnostic or prognostic utility.

Particularly useful aspects of the genotyping methods described hereinare ease of assay design, low cost of reagents and suitability of thecleavage products for detection by a variety of methods including,without limitation, electrophoresis, mass spectrometry and fluorescentdetection.

In another aspect of this invention, the complete sequence of apolynucleotide may be determined by repeating the above method involvingthe replacement of one natural nucleotide at each point of occurrence ofthe natural nucleotide with one modified nucleotide followed by cleavageand mass detection. In this embodiment, the procedure is carried outfour times with each of the natural nucleotides; i.e., in the case ofDNA, for example but without limitation, each of dA, dC, dG and T isreplaced with a modified nucleotide in four separate experiments. Themasses obtained from the four cleavage reactions can then be used todetermine the complete sequence of the polynucleotide. This method isapplicable to polynucleotides prepared by primer extension oramplification by, for example, PCR; in the latter case both strandsundergo modified nucleotide replacement.

An additional experiment may be necessary should the preceding procedureleave any nucleotide positions in the sequence ambiguous (see, e.g., theExamples section, infra). This additional experiment may be repeated theabove procedure using the complementary strand of the DNA being studiedif the method involves primer extension. The additional experiment mayalso be the use of the above described method for replacing two naturalnucleotides with two modified nucleotides, cleaving where the modifiednucleotides are contiguous and then determining masses of the fragmentsobtained. Knowledge of the position of contiguous nucleotides in thetarget polynucleotide may resolve the ambiguity. Another experimentwhich might be employed to resolve any ambiguity which might occur inthe main experiment is one-pass Sanger sequencing followed by gelelectrophoresis which is fast and easy but which alone would not affordhighly accurate sequencing. Thus, in conjunction with the methods ofthis invention, an alternative sequencing method known in the art might,in the case of a specific ambiguity, provide the information necessaryto resolve the ambiguity. Combinations of these procedures might also beused. The value of using different procedures lies in the generallyrecognized observation that each sequencing method has certainassociated artifacts that compromise its performance but the artifactsare different for different procedures. Thus, when the goal is highlyaccurate sequencing, using two or more sequencing techniques which wouldtend to cancel out each other's artifacts should have great utility.Other additional experiments that might resolve an ambiguity will, basedon the disclosures herein and the specific sequence ambiguity at issue,be apparent to those skilled in the art and are, therefore, deemed to bewithin the scope and spirit of this invention.

In yet another aspect of this invention, the modified nucleotidecleavage reactions described herein may result in the formation of acovalent bond between one of the cleavage fragments and anothermolecule. This molecule may serve a number or purposes. It may contain adirectly detectable label or a moiety that enhances detection of thecleavage products during mass spectrometric, electrophoretic orfluorogenic analysis. For example, without limitation, the moiety may bea dye, a radioisotope, an ion trap to enhance ionization efficiency, anexcitable group which can to desorption efficiency or simply a largemolecule which globally alter desorption and/or ionizationcharacteristics. The labeling reaction may be partial or complete. Anexample of the use of homogeneously labeled DNA fragments ofcontrollable size is in DNA hybridization such as hybridization probesfor DNA on high-density arrays like DNA chips.

An additional aspect of this invention is the replacement of a naturalnucleotide with a modified nucleotide at only a percentage of the pointof occurrence of that natural nucleotide in a polynucleotide. Thispercentage may be from about 0.01% to about 95%, preferably it is fromabout 0.01% to about 50%, more preferably from about 0.01% to about 10%and most preferably from about 0.01% to about 1%. The percentreplacement is selected to be complementary to the efficiency of thecleavage reaction selected. That is, if a cleavage reaction of lowefficiency is selected, then a higher percentage of substitution ispermissible; if a cleavage reaction of high efficiency is selected, thena low percentage of replacement is preferred. The result desired isthat, on the average, each individual strand of polynucleotide iscleaved once so that a sequencing ladder, such as that described for theMaxam-Gilbert and Sanger procedures, can be developed. Since thecleavage reactions described herein are of relatively high efficiency,low percentages of replacement are preferred to achieve the desiredsingle cleavage per polynucleotide strand. Low percentages ofreplacement may also be more readily achieved with availablepolymerases. However, based on the disclosures herein, other cleavagereactions of varying degrees of efficiency will be apparent to thoseskilled in the art and, as such, are within the scope and spirit of thisinvention. It is, in fact, an aspect of this invention that, usingcleavage reactions of sufficiently low efficiency, which, in terms ofpercentage cleavage at points of incorporation of a modified nucleotidein a modified polynucleotide may be from about 0.01% to 50%, preferablyfrom about 0.01% to 10% and, most preferably, from about 0.01% to about1%, a polynucleotide in which a natural nucleotide has been replacedwith a modified nucleotide at substantially each point of occurrence maystill be used to generate the sequencing ladder. At the most preferredlevel of efficiency, about 0.01% to about 1%, each strand of a fullymodified polynucleotide should, on the average, only be cleaved once.

In another aspect, this invention relates to methods for producing andidentifying polymerases with novel properties with respect toincorporation and cleavage of modified nucleotides.

A. Nucleotide Modification and Cleavage

(1) Base Modification and Cleavage

A modified nucleotide may contain a modified base, a modified sugar, amodified phosphate ester linkage or a combination of these.

Base-modification is the chemical modification of the adenine, cytosine,guanine or thymine (or, in the case of RNA, uracil) moiety of anucleotide such that the resulting chemical structure renders themodified nucleotide more susceptible to attack by a reagent than anucleotide containing the unmodified base. The following are examples,without limitation of base modification. Other such modification ofbases will become readily apparent to those skilled in the art in lightof the disclosures herein and therefore are to be considered to bewithin the scope and spirit of this invention (e.g., the use ofdifluorotoluene; Liu, D., at al., Chem. Biol., 4:919-929, 1997; Moran,S., et al., Proc. Natl. Acad. Sci. USA, 94:10506-10511, 1997).

Some examples, without limitation, of such modified bases are describedbelow.

1. Adenine (1) can be replaced with 7-deaza-7-nitroadenine (2). The7-deaza-7-nitroadenine is readily incorporated into polynucleotides byenzyme-catalyzed polymerization. The 7-nitro group activates C-8 toattack by chemical base such as, without limitation, aqueous sodiumhydroxide or aqueous piperidine, which eventually results in specificstrand scission. Verdine, et al., JACS, 1996, 118:6116-6120;

It has been found that cleavage with piperidine is not always completewhereas complete cleavage is the desired result. However, when thecleavage reaction is carded out in the presence of a phosphinederivative, for example, without limitation, tris(2-carboxyethyl)phosphine (TCEP) and a base, complete cleavage is obtained. An exampleof such a cleavage reaction is as follows: DNA modified by incorporationof 7-nitro-7-deaza-2′-deoxyadenosine is treated with 0.2 M TCEP/1 Mpiperidine/0.5 M Tris base at 950 C for one hour. Denaturingpolyacrylamide gel (20%) analysis showed complete cleavage. Other basessuch as, without limitation, NH₄OH can be used in place of thepiperidine and Tris base. This procedure, i.e., the use of a phosphinein conjunction with a base, should be applicable to any cleavagereaction in which the target polynucleotide has been substituted with amodified nucleotide that is labile to piperidine.

The product of cleavage with TCEP and base is unique. Mass spectrometryanalysis was consistent with a structure having a phosphate-ribose-TCEPadduct at 3′ ends and a phosphate moiety at 5′ ends, i.e. structure 3.

How TCEP participates in the fragmentation of a modified polynucleotideis not presently known; however, without being held to any particulartheory, we believe that the mechanism may be the following:

The incorporation of the TCEP (or other phosphine) into the cleavageproduct should be a very useful method for labeling fragmentedpolynucleotides at the same time cleavage is being performed. By usingan appropriately functionalized phosphine that remains capable offorming an adduct at the 3′ end ribose as described above, suchfunctionalities, without limitation, as mass tags, fluorescence tags,radioactive tags and ion-trap tags could be incorporated into afragmented polynucleotide. Phosphines that contain one or more tags andthat are capable of covalently bonding to a cleavage fragment constituteanother aspect of this invention. Likewise, the use of such taggedphosphines as a method for labeling polynucleotide fragments is anotheraspect of this invention.

While other phosphines, which may become apparent to those skilled inthe art based on the disclosures herein, may be used to prepare labeledphosphines for incorporation onto nucleotide fragments, TCEP is aparticularly good candidate for labeling. For instance, the carboxy(—C(9)OH) groups may be modified directly by numerous techniques, forexample, without limitation, reaction with an amine, alcohol ormercaptan in the presence of a carboduimide to form an amide, ester ormercaptoester as shown in the following reaction scheme:

wherein M¹ and M² are independently O, NH, NR, S.R¹ and R² are mass tags, fluorescent tags, radioactive tags, ion traptags or combinations thereof.

When a carboxy group is reacted with a carbodiimide in the absence of anucleophile (the amine in this case), the adduct between thecarbodiimide and the carboxy group may rearrange to form a stableN-acylurea. If the carbodiimide contains a fluorophore, the resultantphosphine will then carry that fluorophore as shown in the followingreaction scheme:

Amino group-containing fluorophores such as fluoresceinyl glycine amide(5-(aminoacetamido)fluorescein, 7-amino-4-methylcoumarin,2-aminoacridone, 5-aminofluorescein, 1-pyrenemethylamine and5-aminoeosin may be used to prepare the labeled phosphines of thismethod. Amino derivatives of lucifer yellow and Cascade Blue may also beused, as can amino derivatives of biotin. In addition, hydrazinederivatives such as rhodamine and Texas Red hydrazine may be useful inthis method.

Fluorescent diazoalkanes, such as, without limitation,1-pyrenyldiazomethane, may also be used to form esters with TCEP.

Fluorescent alkyl halides may also react with the anion of the carboxygroup, i.e., the C(O)O⁻ group, to form esters. Among the halides whichmight be used are, without limitation, panacyl bromide,3-bromoacetyl-7-diethylaminocoumarin,6-bromoacetyl-2-diethylaminonaphthalene, 5-bromomethylfluorescein,BODIPY® 493/503 methyl bromide, monobromobimanes and iodoacetamides suchas coumarin iodoacetamide may serve as effective label-carrying moietieswhich will covalently bond with TCEP.

Naphthalimide sulfonate ester reacts rapidly with the anions ofcarboxylic acids in acetonitrile to give adducts which areodetectable byabsorption at 259 nm down to 100 femtomoles and by fluorescence at 394nm down to four femtomoles.

There are, furthermore, countless amine-reactive fluorescent probesavailable and it is possible to covert TCEP into a primary amine by thefollowing reaction:

The aminophosphine can then be used to form label-containingaminophosphines for use in the cleavage/labeling method describedherein.

The above dyes and procedures for cbvalently bonding them to TCEP arebut a few examples of the possible adducts which can be formed. A sourceof additional reagents and procedures is the catalog of MolecularProbes, Inc. Based on the disclosures herein and resources such as theMolecular Probes catalog, many others way to modify phosphines, inparticular TCEP, will be apparent to those skilled in the art. Thoseother ways to modify phosphines for use in the incorporation of labelsinto polynucleotide fragments during chemical cleavage of thepolynucleotide are within the scope and spirit of this invention.

2. Cytosine (4) can be replaced with 5-azacytosine (5). 5-Azacytosine islikewise efficiently incorporated into polynucleotides by enzymecatalyzed polymerization. 5-Azacytosine is susceptible to cleavage bychemical base, particularly aqueous base, such as aqueous piperidine oraqueous sodium hydroxide. Verdine, et al., Biochemistry, 1992,31:11265-11273;

3(a). Guanine (6) can be replaced with 7-methylguanine (Z) and canlikewise be readily incorporated into polynucleotides by polymerases(Verdine, et al., JACS, 1991, 113:5104-5106) and is susceptible toattack by chemical base, such as, without limitation, aqueous piperidine(Siebenlist, et al., Proc. Natl. Acad. Sci. USA, 1980, 77:122); or,

3(b). Gupta and Kool, Chem. Commun. 1997, pp 1425-26 have demonstratedthat N⁶-allyl-dideoxyadenine, when incorporated into a DNA strand, willcleave on treatment with a mild electrophile, E⁺, in their case iodine.The proposed mechanism is shown in Scheme 1:

A similar procedure might be employed with guanine using the previouslyunreported 2-allylaminoguanine derivative 8, which can be prepared bythe procedure shown in Scheme 2:

Other ways to synthesize compound 8 will become apparent based on thedisclosures herein; such syntheses are considered within the spirit andscope of this invention. The incorporation of the resultingN²-allylguanosine triphosphate into a polynucleotide strand should besusceptible to cleavage in a similar manner to the N⁶-allyladeninenucleotide of Gupta, i.e. by the mechanism shown in Scheme 3:

4. Either thymine (9) or uracil (10) may be replaced with5-hydroxyuracil (11) (Verdine, JACS, 1991, 113:5104). As with theabove-modified bases, the nucleotide prepared from 5-hydroxyuracil canalso be incorporated into a polynucleotide by enzyme-catalyzedpolymerization. Verdine, et al., JACS, 1993, 115:374-375. Specificcleavage is accomplished by first treating the 5-hydroxyuracil with anoxidizing agent, for instance, aqueous permanganate, and then with achemical base such as, without limitation, aqueous piperidine (Verdine,ibid.).

5. Pyrimidines substituted at the 5-position with an electronwithdrawing group such as, without limitation, nitro, halo or cyano,should be susceptible to nucleophilic attack at the 6-position followedby base-catalyzed ring opening and subsequent degradation of thephosphate linkage. An example, which is not to be construed as limitingthe scope of this technique in any manner, is shown in (Scheme 4) using5-substituted cytidine. If the cleavage is carried out in the presenceof tris(carboxyethyl)phosphine (TCEP), the adduct 10 may be obtainedand, if the TCEP is functionalized with an appropriate moiety (q.v.infra), labeled polynucleotide fragments may be obtained.

Although, as shown above, using TCEP in the cleavage reaction can resultin the formation of the chemically stable adduct, -secondary amines suchas piperidine, pyrrolidine, morpholine, diethylamine (and homologsthereof) may be also be used for labeling fragments during cleavage. InFIG. 38, DNA cleavage and fluorescence labeling using a secondary amineis shown. Oxidation using potassium permanganate, results in a labileintermediate that reacts with the amine to form a stable secondaryamine-DNA adduct. The secondary amines could be derivatized withfluorophores or radioactive moieties for detection purposes.

(2) Sugar Modification and Cleavage

Modification of the sugar portion of a nucleotide may also afford amodified polynucleotide that is susceptible to selective cleavage at thesite(s) of incorporation of such modification. In general, the sugar ismodified to include one or more functional groups which renders the 3′andior the 5′ phosphate ester linkage more labile; i.e. susceptible tocleavage, than the 3 and/or 5′ phosphate ester linkage of a naturalnucleotide. The following are examples, without limitation, of suchsugar modifications. Other sugar modifications will become readilyapparent to those skilled in the art in light of the disclosures hereinand are therefore deemed to be within the scope and spirit of thisinvention. In the formulas which follow, B and B′ refer to any base andthey may be the same or different.

1. In a deoxyribose-based polynucleotide, replacement of one or more ofthe deoxyribonucleosides with a ribose analog; e.g., without limitation,substituting adenosine (12) for deoxyadenosine (13) renders theresultant modified polynucleotide susceptible to selective cleavage bychemical bases such as, without limitation, aqueous sodium hydroxide orconcentrated ammonium hydroxide, at each point of occurrence ofadenosine in the modified polynucleotide (Scheme 5);

2. A 2′-ketosugar (14, synthesis: JACS, 1967, 89:2697) may besubstituted for the sugar of a deoxynucleotide; upon treatment withchemical base such as, without limitation, aqueous hydroxide, the ketogroup equilibrates with its ketal form (15) which then attacks thephosphate ester linkage effecting cleavage (Scheme 6);

3. A deoxyribose nucleotide can be replaced with its arabinose analog;i.e., a sugar containing a 2″-hydroxy group (16). Again, treatment withmild (dilute aqueous) chemical base effects the intermoleculardisplacement of a phosphate ester linkage resulting in cleavage of thepolynucleotide (Scheme 7):

4. A deokyribose 8 nucleotide can be replaced b y its 4′-hycroxymethylanalog (17, synthesis: Helv. Chim. Acta, 1966, 79:1980) which, ontreatment with mild chemical base such as, without limitation, diluteaqueous hydroxide, likewise displaces a phosphate ester linkage causingcleavage of the polynucleotide as shown in Scheme 8:

5. A deoxyribose nucleotide can be replaced by its 4′-hydroxycarbocyclic analog; i.e., a 4-hydroxymethylcyclopenane derivative (18)which, on treatment with aqueous base, results in the cleavage of thepolynucleotide at a phosphate ester linkage as shown in Scheme 9:

6. A sugar ring may be replaced with its carbocyclic analog, which isfurther substituted with a hydroxyl group (19). Depending on thestereochemical positioning of the hydroxyl group on the ring, either a3′ or a 5′ phosphate ester linkage can be selectively cleaved ontreatment with mild chemical base (Scheme 10):

7. In each of examples 1, 3, 4, 5 and 6, above, the hydroxy group, whichattacks the phosphate ester cleavage may be replaced with an amino group(—NH₂). The amino group may be generated in situ from the correspondingazidosugar by treatment with tris(2-carboxyethyl)-phosphine (TCEP) afterthe azide-modified polynucleotide has been formed (Scheme 11). The aminogroup, once formed, spontaneously attacks the phosphate ester linkageresulting in cleavage.

8. A sugar may be substituted with a functional group which is capableof generating a free radical such as, without limitation, aphenylselenyl (PhSe—) or a t-butyl ester group (^(t)BuC(═O)—) (Angew.Chem. Int. Ed. Engl. 1993, 32:1742-43). Treatment of the modified sugarwith ultraviolet light under anaerobic conditions results in theformation of a C₄ radical whose fragmentation causes the excision of themodified nucleotide and thereby the cleavage of the polynucleotide atthe modified nucleotide (Scheme 12). The free radicals may be generatedeither prior to or during the laser desorption/ionization process ofMALDI mass analysis. Modified nucleotides with other photo-labile 4′substituents such as, without limitation, 2-nitrobenzyl groups or3-nitrophenyl groups (Synthesis, 1980, 1-26) and bromo or iodo groupsmay also be used as precursors to form a C₄′ radical.

9. An electron-withdrawing group may be incorporated into the sugar suchthat the nucleotide is either rendered susceptible to p-elimination(when W is cyano (a “cyanosugar” 20)) or the oxyanion formed by thehydrolysis of the 3′-phosphate linkage is stabilized and thus hydrolysiswith mild chemical base will be preferred at the modified sugar; suchelectron-withdrawing groups include, without limitation, cyano (—C≡N),nitro (—NO₂), halo (in particular, fluoro), azido (—N₃) or methoxy(—OCH₃) (Scheme 13):

A cyano sugar can be prepared by a number of approaches, one of which isshown in (Scheme 14). Other methods will no doubt be apparent to thoseskilled in the art based on the disclosures herein; such alternateapproaches to cyano (or other electron withdrawing group substitutedsugars) are within the spirit and scope of this invention.

10. The ring oxygen of a sugar may be replaced with another atom; e.g.,without limitation, a nitrogen to form a pyrrole ring (21). Or, anotherheteroatom may be placed in the sugar ring in place of one of the ringcarbon atoms; for example, without limitation, a nitrogen atom to forman oxazole ring (22). In either case, the purpose of the different oradditional heteroatom is to render the phosphate ester linkage of theresulting non-natural nucleotide more labile than that of the naturalnucleotide (Scheme 15):

11. A group such as, without limitation, a mercapto group may beincorporated at the 2″ position of a sugar ring which group, ontreatment with mild chemical base, forms a ring by elimination of the3′-phosphate ester (Scheme 16).

12. A keto group can be incorporated at the 5′ position such that theresulting phosphate has the lability of an anhydride, i.e., structure23. A nucleotide triphosphate such as 23 may be synthesized by theprocedure shown in Scheme 17. It is recognized that other routes to suchnucleotide triphosphates may become apparent to those skilled in the artbased on the disclosures herein; such syntheses are within the spiritand scope of this invention.

Polynucleotides into which nucleotide triphosphates of structure 23 havebeen incorporated should, like analogous mixed anhydrides, besusceptible to alkaline hydrolysis as shown in Scheme 18:

13. The phosphate linkage could be turned into the relatively morelabile enol ester linkage by the incorporation of a double bond at the5′ position, that is, a nucleotide triphosphate of structure 24 could beused. A nucleotide triphosphate of structure 24 can be prepared by theprocedure shown in Scheme 19. It is again understood that other ways toproduce structure 24 may be apparent to those skilled in the art basedon the disclosures herein, as before, these alternate syntheses are wellwithin the spirit and scope of this invention.

The enol ester would be susceptible to alkaline cleavage (Scheme 20).

14. Difluoro substitution at the 5′ position would increase the labilityof the phosphate linkage and would also push the reaction to completionby virtue of the hydrolysis of the intermediate difluorohydroxy group toan acid group as shown in Scheme 22. The dihalo derivative could besynthesized by the procedure shown in Scheme 21. Once again, the routeshown in Scheme 21 is not the only way possible to make thedifluoronucleotide triphosphate. However, as above, these other routeswould be apparent based on the disclosures herein and would be withinthe spirit and scope of this invention.

(3) Phosphate Ester Modification and Cleavage

Modification of the phosphate ester of a nucleotide results inmodification of the phosphodiester linkages between the 3′-hydroxy groupof one nucleotide and the 5′-hydroxy group of the adjacent nucleotidesuch that one or the other of the modified 3′ or 5′ phosphate esterlinkages is rendered substantially more susceptible to cleavage that thecorresponding unmodified linkage. Since the phosphodiester linkage formsthe backbone of a polynucleotide, this modification method will, herein,be referred to alternatively as “backbone modification.” The followingare non-limiting examples of backbone modification. Other suchmodifications will become apparent to those skilled in the art based onthe disclosures herein and therefore are deemed to be within the scopeand spirit of this invention.

1. Replacement of an oxygen in the phosphate ester linkage with asulfur; i.e., creation of a phosphorothioate linkage (25a, 25b, 25c)which either directly on treatment with mild base (Schemes 23(a) and23(b)) or on treatment with an alkylating agent, such as, for instance,methyl iodide, followed by treatment with strong non-aqueous organicbase, for example, methoxide (Scheme 23(c)), results in the selectivecleavage of the phosphothioester linkage. Alternatively,phosphorothioate linkages such as those in Formula 14 may also beselectively cleaved through laser photolysis during MALDI mass analysis.This in-source fragmentation procedure (Internat'l J. of Mass Spec. andIon Process, 1997, 169/170:331-350) consolidates polynucleotide cleavageand analysis into one step;

2. Replacement of an oxygen in the phosphate linkage with a nitrogencreating a phosphoramidate linkage (26) which, on treatment with, forinstance and without limitation, dilute aqueous acid, will result inselective cleavage (Scheme 24);

3. Replacement of one of the free oxygen atoms attached to thephosphorus of the phosphate backbone with an alkyl group, such as,without limitation, a methyl group, to form a methylphosphonate linkage,which, on treatment with strong non-aqueous organic base, such aswithout limitation, methoxide, will likewise result in selectivecleavage (Scheme 25).

4. Alkylation of the free oxyanion of a phosphate ester linkage with analkyl group such as, without limitation, a methyl group will, ontreatment with strong non-aqueous organic base such as withoutlimitation, methoxide, result in the selective cleavage of the resultingalkylphosphorotriester linkage (Scheme 26).

5. Treatment of a phosphorothioate with P-mercaptoethanol in a strong,base such as, without limitation, methanolic sodium methoxide, in whichthe mercaptoethanol exists primarily as the disulfide, could result inthe formation of a mixed disulfide, which would then degrade, with orwithout rearrangement, to give the cleavage products shown in Scheme 27.

(4) Dinucleotide Modification and Cleavage

The previous substitutions are all single substitutions; that is, onemodified nucleotide is substituted for one natural nucleotide whereverthe natural nucleotide occurs in the target polynucleotide or, ifdesired, at a fraction of such sites. In an additional aspect of thisinvention, multiple substitutions may be used. That is, two or moredifferent modified nucleotides may be substituted for two or moredifferent natural nucleotides, respectively, wherever the naturalnucleotides occur in a subject polynucleotide. The modified nucleotidesand cleavage conditions are selected such that, under the propercleavage conditions, they do not individually confer selective cleavageproperties on a polynucleotide. When, however, the proper cleavageconditions are applied and the modified nucleotide are incorporated intothe polynucleotide in a particular spatial relationship to one another,they interact to jointly render the polynucleotide selectivelycleavable. Preferably, two modified nucleotides are substituted for twonatural nucleotides in a polynucleotide; thus, this method is referredto herein as “dinucleotide modification.” It is important to note that,individually, each of the two modified nucleotides may elicit specificand selective cleavage of a polynucleotide albeit under quite different,typically more vigorous chemical conditions.

As used herein, “spatial relationship” refers to the 3-dimensionalrelationship between two or more modified nucleotides after substitutioninto a polynucleotide. In a preferred embodiment of this invention, twomodified nucleotides must be contiguous in a modified polynucleotide inorder to impart altered cleavage properties on the modifiedpolynucleotide. By employing two modified nucleotides in this manner,and then cleaving the modified polynucleotide, the relationship betweentwo natural nucleotides in a target polynucleotide can be establisheddepending on the nature of the multiple substitution selected. That is,the natural nucleotides being replaced would also have been adjacent toone another in the natural nucleotide. For example, without limitation,if a modified A and modified G are replaced at every point of occurrenceof the corresponding natural A and natural G, respectively, the modifiedpolynucleotide will be rendered selectively cleavable only where thenatural A and G were directly adjacent, i.e., AG or GA (but not both),in the naturally-occurring polynucleotide. As shown below, proper choiceof the modified polynucleotides will also reveal the exact relationshipof the nucleotides, i.e., in the example above, whether the nucleotidesequence in the natural polynucleotide was AG or GA. The following arenon-limiting examples of multiple substitutions. Other multiplesubstitutions will become apparent to those skilled in the art based onthe disclosures set forth herein and therefore are deemed to be withinthe scope and spirit of this invention.

1. One modified nucleotide may contain a functional group capable ofeffecting nucleophilic substitution while the companion modifiednucleotide is modified so as to render it a selective leaving group. Thenucleophile and the leaving group may be in a 5′-3′ orientation or in a3′-5′ orientation with respect to one another. A non-limiting example ofthis is shown in Scheme 28. The 2′ or 2″ hydroxy group on one modifiednucleotide, when treated with mild chemical base becomes a goodnucleophile. The other modified nucleotide contains a 3′ or 5′thiohydroxy (—SH) group which forms a 3′ or 5′ phosphorothioate linkagewhen incorporated into the modified polynucleotide. Thisphosphorothioate linkage is selectively more labile than a normalphosphodiester linkage. When treated with mild base, the oxyanion formedfrom the hydroxy group of one modified nucleotide selectively displacesthe thiophosphate linkage to the other modified nucleotide resulting incleavage. As shown in Scheme 28(a) and 28(b), depending on thestereochemical relationship between the hydroxy group and thethiophosphate linkage, cleavage will occur either to the 3′ or the 5′side of the hydroxy-containing modified nucleotide. Thus, the exactrelationship of the natural nucleotides in the naturally occurringpolynucleotide is revealed.

2 (a). If one modified nucleotide contains a 3′ or 5′ amino (—NH₂) groupand the other modified nucleotide contains a 5′ or 3′ hydroxy group,respectively, treatment of the resulting phosphoroamidate-linkedpolynucleotide with mild acid results in the protonation of the aminogroup of the phosphoroamidate linkage which then becomes a very goodleaving group. Once again, depending on the spatial relationship betweenthe hydroxy group of one modified nucleotide and the amino group of theother modified nucleotide, the exact relationship of the nucleotides inthe naturally occurring polynucleotide can be determined as shown inSchemes 29(a) and 29(b).

Dinucleotide cleavage of a ribonucleotide/5′-aminonucleotide 5′-3′linkage is presently preferred embodiment of this invention. Examples ofthis method are shown in FIGS. 21-26.

2(b). When the amino group of the modified nucleotide is 5′, aribonucleotide/5′-amino 2′,5′-dideoxynucleotide pair may be cleavedduring the polymerization process. For example, without limitation,cleavage occurs during the incorporation of adenine ribonucleotide and5′-aminodideoxythymine nucleotide into a polynucleotide using acombination of wild type Klenow (exo-) and mutant E710A Klenow (exo-)polymerases. E710A is a mutant Klenow (exo-) polymerase in which aglutamate at residue 710 has been replace by alanine. The E710A mutantis more efficient at incorporating both ribonucleotides anddeoxyribonucleotides into a single nascent polynucleotide strand thatKlenow (exo-). Other polymerases with similar properties will beapparent to those skilled in the art based on the disclosures herein andtheir use for the incorporation of ribonucleotide and5′-amino-2′,5′-dideoxynucleotide into a polynucleotide with subsequentcleavage during the polymerization reaction is within the scope andspirit of this invention.

When a 5′-end radiolabeled primer was extended using a mixture of Klenow(exo-) and E710A Klenow (exo-), only one fragment (the 5′-end fragment)was observed indicating complete cleavage at theribonucleotide-5′-aminonucleotide sites. We have shown (FIGS. 21-26)that the polymerization and cleavage occur in the same step. Presumably,cleavage is thermally induced during protein-DNA contact. The figuresshow that the polymerases continue to extend the template even aftercleavage, which also suggests that the cleavage is the result ofprotein-DNA contact. While USB brand Klenow polymerase (Amersham) wasalso able to incorporate the two nucleotides, it was not as efficient asthe mixture of polymerases and, furthermore, multiple product bands wereobserved indicating incomplete cleavage at the AT sites.

The above is, of course, a specific example of a general concept. Thatis, other wild type polymerases, mutant polymerases or combinationsthereof should likewise be capable of cleaving, or facilitating cleavageof, modified nucleotides or dinucleotides during the polymerizationprocedure. The procedure for determining the exact combinations ofpolymerase(s) and nucleotide modifications that result in cleavage,based on the disclosures herein, will be apparent to those skilled inthe art. For instance, as is described below, it may be useful togenerate a library of mutant polymerases and select specifically forthose that induce dinucleotide cleavage. Thus, a pqlymerase or acombination of polymerases which cause the cleavage of a formingmodified polynucleotide during the polymerization process is yet anotheraspect of this invention, as are the method of cleaving a modifiedpolynucleotide during the polymerization process using a polymerase orcombination of polymerases and the modified nucleotide(s) necessary forthe cleavage to occur.

3. An electron-withdrawing group can be placed on a sugar carbonadjacent to the carbon which is bonded to the hydroxy groupparticipating in the ester linkage of a methylphosphonate (Scheme 30(a))or methylphosphotriester (Scheme 30(b)) backbone. This will result inincreased stability of the oxyanion formed when the phosphate group ishydrolyzed with mild chemical base (Scheme 30) and thus selectivehydrolysis of those phosphate linkages compared to phosphate linkagesnot adjacent to such hydroxy groups.

4. An electron-withdrawing group can be placed on the 4′ carbon of anucleotide that is linked through its 5′-hydroxy group to the 3′-hydroxygroup of an adjacent ribonucleotide. Treatment with dilute base willresult in cleavage as shown in Scheme 31.

5. A 2′ or 4′ leaving group in a sugar may be susceptible to attack bythe sulfur of a phosphorothioate as shown in Schemes 32 and 33 to affordthe desired cleavage:

6. Ethylene sulfide could effect the cleavage of a 2′ fluoro derivativeof a sugar next to a phosphorothioate according to Scheme 34:

β-Mercaptoethanol or a similar reagent may be substituted for theethylene sulfide.

7. A phosphorothioate might coordinate with a metal oxidant such as,without limitation, Cu^(II) or Fe^(III), which would be held in closeproximity to the 2′ hydroxy group of an adjacent ribonucleotide.Selective oxidation of the 2′ hydroxy group to a ketone should renderthe adjacent phosphate linkage more susceptible to cleavage under basicconditions than the corresponding ribonucleotides ordeoxyribonucleotides as shown in Scheme 35:

The preceding cleavage reactions may be carried out in such a manner asto cause cleavage at substantially all points of occurrence of themodified nucleotide or, in the case of multiple substitutions, allpoints of occurrence of two or more modified nucleotides in the properspatial relationship. On the other hand, by controlling the amount ofcleaving reagent and the reaction conditions, cleavage can be partial;i.e., cleavage will occur at only a fraction of the points of occurrenceof a modified nucleotide or pairs of modified nucleotides.

B. Fragmenting Modified Polynucleotides in Mass Spectrometers

The preceding discussion relates to chemical methods for cleavingpolynucleotides at sites where modified nucleotides have beenincorporated. However, besides fragmenting polynucleotide moleculeschemically in solution, it is a further aspect of this invention thatfragmentation is accomplished within a mass spectrometer using chemicalor physical means. Further, by manipulating the conditions within themass spectrometer, the extent of fragmentation can be controlled. Theability to control degree of fragmentation of chemically modifiedoligonucleotides can be very useful in determining relationships betweenadjacent sequences. This is because, while mass spectrometric (MS)analysis of a completely cleaved polynucleotide provides the masses andtherefore the nucleotide content of each fragment polynucleotide,determining the order in which these fragment polynucleotides are linkedtogether in the original (analyte) polynucleotide is a difficultproblem. By relaxing the stringency of cleavage one can generatefragments that correspond to two or more fragments from the completecleavage set. The mass of these compound fragments provides theinformation that permits the inference that the two component fragmentsare adjacent in the original polynucleotide. By determining thatmultiple different pairs or triplets of complete cleavage fragments areadjacent to each other, eventually a much larger sequence can be piecedtogether than if one must rely solely on analysis of complete cleavagefragments. The ability to control the conditions of fragmentation bymanipulation in the mass spectrometer is particularly advantageousbecause, in contrast to the iterative generation and subsequent testingof partial cleavages in a test tube, the effect of various partialcleavage conditions can be directly observed in real time andinstantaneously manipulated to provide the optimal partial cleavage dataset(s). For some purposes, use of several partial cleavage conditionsmay be very useful as successive levels of partial cleavage will providea cumulative picture of the relationships between ever-larger fragments.Specific mechanisms for fragmentation of modified polynucleotides aredescribed below.

First, by choice of appropriate ionization methods, fragmentation can beinduced during the ionization process. Alternatively, in the tandem massspectrometry (MS/MS) approach, ions with mass-to-charge ratios (m/z) ofinterest can be selected and then activated by a variety of proceduresincluding collision with molecules, ions or electrons, or the absorptionof photons of various wavelength, leading to the fragmentation of theions. In one aspect, ionization and fragmentation of the polynucleotidemolecules can be achieved with fast atom bombardment (FAB). In thisapproach, modified polynucleotide molecules are dissolved in a liquidmatrix such as glycerol, thioglycerol, or other glycerol analogs. Thesolution is deposited on a metallic surface. Particles with thousands ofelectron volts of kinetic energy are directed at the liquid droplet.Depending on the modification of the polynucleotides, partialfragmentation or complete fragmentation at every modified nucleotide canbe achieved.

In another aspect, ionization and fragmentation can be effected bymatrix-assisted laser desorption ionization mass spectrometry(MALDI-MS). In MALDI-MS a solution of modified polynucleotide moleculesis mixed with a matrix solution, e.g., 3-hydroxypicolinic acid inaqueous solution. An aliquot of the mixture is deposited on a solidsupport, typically a metallic surface with or without modification.Lasers, preferably with wavelength between 3 μm and 10.6 Fm, are used toirradiate the modified polynucleotide/matrix mixture. To analyzein-source fragmentation (ISF) products, delayed extraction can beemployed. To analyze post-source decay (PSD) products, an ion reflectorcan be employed.

In another approach, ionization and fragmentation can be accomplished byelectrospray ionization (ESI). In this procedure, the solution ofmodified DNA is sprayed through the orifice of a needle with a fewkilovolts of voltage applied. Fragmentation of the modifiedpolynucleotide molecules would occur during the desolvation process inthe nozzle-skimmer (NS) region. The degree of the fragmentation willdepend on the nature of the modification as well as factors such thevoltage between the nozzle and skimmer, the flow rate as well as thetemperature of the drying gas. If a capillary is used to assist thedesolvation, then it is the voltage between the exit of the capillaryand the skimmer and the temperature of the capillary that need to becontrolled to achieved the desired degree of fragmentation.

In yet another technique, modified polynucleotide molecules can beselectively activated and dissociated. Activation can be accomplished byaccelerating precursor ions to a kinetic energy of a few hundred to afew million electron volts and then causing them to collide with neutralmolecules, preferably of noble gas. In the collision some of the kineticenergy of the precursor ions is converted into internal energy andcauses fragmentation. Activation can be also accomplished by allowingaccelerated precursor ions to collide onto a conductive orsemi-conductive surface. Activation can also be accomplished by allowingaccelerated precursor ions to collide with ions of opposite polarity. Inanother approach, activation can be accomplished by electron capturing.In this technique, the precursor ions are allowed to collide withthermalized electrons. Activation can also be accomplished byirradiating the precursor ions with photons of various wavelengths,preferably in the range of 193 nm to 10.6 μm. Activation can also beaccomplished by heating vacuum chambers for trapped ions; the heating ofvacuum chamber walls causes blackbody IR irradiation (Williams, E. R.,Anal. Chem., 1998, 70:179A-185A). The presence of modified nucleotidesin a polynucleotide could also increase the rate constant of thefragmentation reaction, shortening the 10-1000 second duration requiredby the blackbody IR irradiation approach for unmodified polynucleotides.

As noted previously, tandem mass spectrometry is another tool that maybe beneficially employed with the methods of this invention. In tandemmass spectrometry, precursor ions with m/z of interest are selected andsubjected to activation. Depending on the activation technique used,some or all of the precursor ions can be fragmented to give productions. When this is done inside a suitable mass spectrometer (e.g.,Fourier-transform ion cyclotron resonance mass spectrometer and ion trapmass spectrometers), the product ions with m/z of interest can befurther selected and subjected to activation and fragmentation, givingmore product ions. The mass of both precursor and product ions can bedetermined.

To control the degree of fragmentation at different stage of activation,two or more different types of modified nucleotides which, for purposesof discussion will be called Type I and Type II, with differentsensitivity to different activation techniques could be incorporated(complete replacement of the natural nucleotide) into a targetpolynucleotide. Such a polynucleotide can be fragmented with highefficiency by type I activation technique at every position where type Imodified nucleotides are incorporated. The resulting fragment ions,which still contain type II modified nucleotides can then be selectedand fragmented by a type II activation technique to generate a set ofsub-fragments from which nucleotide content can be more readilyinferred. Such an approach can be useful for variance detection. Forexample, a 500-mer polynucleotide can be first fragmented into 10-50fragments using a type I fragmentation technique. The m/z of eachfragment (when compared to the predicted set of fragment masses) willreveal if a variance resides in this fragment. Once fragments containinga variance are identified, the rest of the fragment ions are ejectedfrom the ion-trapping device, while the fragment ions of interest aresubjected to activation. By controlling the degree of fragmentation ofthese fragment ions, a set of smaller DNA fragments can be generated,allowing the order of the nucleotides and the position of the varianceto be determined. Compared to the approach involving one type ofmodified nucleotide and one stage fragmentation, such an approach hasthe advantage in that the number of experimental steps and the amount ofdata that needs to be processed is significantly reduced. Compared tothe approach involving one type of modified nucleotide but two stages ofpartial fragmentation, this approach has the advantage in that thefragmentation efficiency at the second stage is more controllable, hencereducing the chance of sequence gaps.

Although the aforementioned schemes of activation can be applied to allkinds of mass spectrometers, ion-trap mass spectrometers (ITMS) andFourier-transform ion cyclotron resonance mass spectrometers (FT-ICRMS)are particularly suited for the electron capturing, photon activation,and blackbody IR irradiation approaches.

C. Modified Nucleotide Incorporation

Several examples of the polymerase-catalyzed incorporation of a modifiednucleotide into polynucleotides are described in the Example section,below. It may be, however, that one particular polymerase will notincorporate all the modified nucleotides described above, or others likethem, which are within the scope of this invention, with the same easeand efficiency. Also, while a particular polymerase may be capable ofincorporating one modified nucleotide efficiently, it may be lessefficient in incorporating a second modified nucleotide directlyadjacent to the first modified nucleotide. Furthermore, currentlyavailable polymerases may not be capable of inducing or facilitatingcleavage at modified nucleotides or nucleotide pairs, an extremelyconvenient way to achieve cleavage (see above). There are, however,several approaches to acquiring polymerases that are capable ofincorporating the modified nucleotides and contiguous pairs of modifiednucleotides of this invention and, potentially, inducing or facilitatingspecific cleavage at that modified nucleotide or those modifiednucleotides.

One approach to finding polymerases with the proper capabilities is totake advantage of the diversity inherent among naturally occurringpolymerases including, without limitation, RNA polymerases, DNApolymerases and reverse transcriptases. Naturally occurring polymerasesare known to have different affinities for non-natural nucleotides andit is likely that a natural polymerase, which will perform the desiredincorporation, can be identified. In some cases, use of a mixture of twoor more naturally occurring polymerases having different propertiesregarding the incorporation of one or more non-natural nucleotides maybe advantageous. For example, W. Barnes has reported (Proc. Natl. Acad.Sci. USA, 1994, 91:2216-2220) the use of two polymerases, anexonuclease-free N-terminal deletion mutant of Taq DNA polymerase and athermostable DNA polymerase having 3′-exonuclease activity, to achieveimproved polymerization of long DNA templates. Naturally occurringpolymerases from thermophilic organisms are preferred polymerases forapplications in which amplification by thermal cycling, e.g., PCR, isthe most convenient way to produce modified polynucleotides.

Another approach is to employ current knowledge of polymerasestructure-function relationships (see, e.g., Delarue, M., et al.,Protein Engineering, 1990, 3:461467; Joyce, C. M., Proc. Natl. Acad.Sci. USA, 1997, 94:1619-1622) to identify or aid in the rational designof a polymerase which can accomplish a particular modified nucleotideincorporation. For example, the amino acid residues of DNA polymerasesthat provide specificity for debxyribo-NTPs (dNTPs, deoxyribo NucleotideTriPhosphates), while excluding ribo-NTPs (rNTPs), have been examined insome detail. Phenylalanine residue 155 or Moloney Murine. Leukemia Virusreverse transcriptase appears to provide a steric barrier that blocksentry of ribo-NTPs. A similar role is played by phenylalanine residue762 of the Klenow Fragment of E. Coli DNA polymerase i, and tyrosineresidue 115 or HIV-1 reverse transcriptase. Mutation of this latteramino acid, or its equivalent, in several different polymerases has theeffect of altering polymerase fidelity and sensitivity to nucleotideinhibitors.

The corresponding site in RNA polymerases has also been investigated andappears to play a similar role in discriminating ribo- fromdeoxyribo-nucleotides. For example, it has been shown that mutation oftyrosine 639 of T7 RNA polymerase to phenylalanine reduces thespecificity of the polymerase for rNTPs by about 20-fold and almosteliminates the K_(m) difference between rNTPs and dNTPs. The result isthat the mutant T7 RNA polymerase can polymerize a mixed dNTP/rNTPchain. See, e.g., Huang, Y., Biochemistry, 1997, 36:13718-13728. Theseresults illustrate the use of structure-function information in thedesign of polymerases that will readily incorporate one or more modifiednucleotides.

In addition, chemical modification or site directed mutagenesis ofspecific amino acids or genetic engineering can be used to createtruncated, mutant or chimeric polymerases with particular properties.For example, chemical modification has been used to modify T7 DNApolymerase (Sequenase®, Amersham) to increase its processivity andaffinity for non-natural nucleotides (Tabor, S., et al., Proc. Natl.Acad. Sci. USA, 1987, 84:4767-4771). Likewise, site directed mutagenesishas been employed to examine how E. coli DNA polymerase I (Klenowfragment) distinguishes between deoxy and dideoxynucleotides (Astake,M., et al., J. Mol. Biol., 1998, 278:147-165).

Furthermore, development of a polymerase with optimal characteristicscan be accomplished by random mutagenesis of one or more knownpolymerases coupled with an assay that manifests the desiredcharacteristics in the mutated polymerase. A particularly usefulprocedure for performing such mutagenesis is called “DNA shuffling” (seeHarayama, S., Trends Biotechnbl., 1998, 16:76-82). For example, usingonly three rounds of DNA shuffling and assaying for β-lactamaseactivity, a variant with 16,000-fold higher resistance to the antibioticcefotaxime than the wild-type gene was created (Stemmer, W. P. C.,Nature, 1994, 370:389-391).

A novel procedure, which is a further aspect of this invention, forcreating and selecting polymerases capable of efficiently incorporatinga modified nucleotide or contiguous pair of modified polynucleotides ofthis invention is described in the Examples section, below.

D. Fragment Analysis

Once a modified nucleotide or nucleotides has been partially orcompletely substituted for one or more natural nucleotides in apolynucleotide and cleavage of the resultant modified polynucleotide hasbeen accomplished, analysis of the fragments obtained can be performed.This can be accomplished by several means. The mass spectrographicapproach discussed in detail herein can be used. Or, if the goal is thedetection of a known polymorphism in a known sequence of apolynucleotide, the inter- or intramolecular hybridization procedures,also discussed in detail below, may be used. In fact, if the goal iscomplete sequencing of a polynucleotide, the above-mentioned partialincorporation of modified nucleotides into a polynucleotide or partialcleavage of a completely modified-nucleotide-substituted polynucleotidemay be used to create fragment ladders similar to those obtained whenusing the classical Maxam-Gilbert or Sanger procedures. In the lattercase, a sequencing ladder can then be constructed using slab, capillaryor miniaturized gel electrophoresis techniques. The advantages of themethod of this invention over the Maxam-Gilbert procedure is that theplacement of the modified nucleotides in the modified polynucleotide isprecise as is cleavage whereas post-synthesis modification of afull-length polynucleotide by the Maxam-Gilbert reactions is susceptibleto error. For example, the wrong nucleotides might be modified and thusthe wrong cleavage may occur or the intended nucleotides may not bemodified at all such that there may be insufficient, perhaps even nocleavage where cleavage would be expected to occur. The advantages overthe Sanger procedure are several. First, the full-length clone can bepurified after extension and prior to cleavage so that prematurelyterminated fragments due to stops caused by polymerase error or templatesecondary structure can be removed before gel electrophoresis resultingin cleaner cleavage bands. In fact, it may not even be necessary toperform such clean up in that the prematurely terminated polymeraseextension fragments themselves will be cleaved if they contain amodified nucleotide and those correctly cleavage fragments will simplyaugment the other fragments obtained from the cleavage of the fulllength clone (although such augmentation is confined to fragmentsshorter than the site of premature termination). Second, the chemicalmethod produces equal intensity sequence ladder products in contrast todye-terminator sequencing where substantial differences in thecharacteristics of different dye terminator molecules or in theinteraction of dye modified dideoxynucleotides with polymerase templatecomplexes results in an uneven signal intensity in the resultingsequence ladders. Such differences can lead to errors and makeheterozygote identification difficult. Third, the chemical methodsdescribed herein allow production of homogeneous sequence ladders overdistances of multiple kb, in contrast to the Sanger chain terminatingmethod, which generate usefully labeled fragments over a substantiallyshorter interval. This is demonstrated in FIGS. 17 and 18. Theproduction of long sequence ladders can be coupled with restrictionendonuclease digestion to accomplish 1× sequencing of long templates.

The utility of this approach to sequencing genomic DNA is described inFIG. 14 and its execution in FIGS. 15 and 16. These methods haveparticular utility in the sequencing of repeat-rich genomes such as,without limitation, the human genome.

i. Mass Spectrometric Methods

A particular advantage of the methods described herein for the use ofmass spectrometry for polynucleotide sequence determination is thespeed, reproducibility, low cost and automation associated with massspectrometry, especially in comparison to gel electrophoresis. See,e.g., Fu, D. J., et al., Nature Biotechnology, 1998, 16:381-384. Thus,although some aspects of this invention may employ gel analysis, thosethat use mass spectroscopy are preferred embodiments.

When detection of variance between two or more related polynucleotidesis the goal, the ability of mass spectrometry to differentiate masseswithin a few or even one atomic mass unit (amu) of each other permitssuch detection without the need for determining the complete nucleotidesequences of the polynucleotides being compared; i.e., the masses of theoligonucleotides provide the nucleotide content. The use of massspectrometry in this manner constitutes yet another aspect of thisinvention.

This use of mass spectrometry to identify and determine the chemicalnature of variances is based on the unique molecular weightcharacteristics of the four deoxynucleotides and their oligomers.

Table 2 shows the mass differences among the four deoxynucleotidemonophosphates. Table 3A then shows the calculated masses of allpossible 2-mers, 3-mers, 4-mers and 5-mers by nucleotide compositionalone; that is, without consideration of nucleotide order. As can beseen, only two of the 121 possible 2 mer through 5 mer oligonucleotideshave the same mass. Thus, the nucleotide composition, of all 2 mers, 3mers, 4 mers and all but two 5 mers created by cleavage of apolynucleotide can be immediately determined by mass spectrometry usingan instrument with sufficient resolving power. For the masses in Table3A, an instrument with a resolution (full width at half-maximal height)of 1500 to 2000 would be sufficient; mass spectrometers with resolutionup to 10,000 are commercially available. However, when cleavage isperformed at all sites of modified nucleotide substitution, it is notnecessary to consider the masses of all possible 2 mers, 3 mers, 4 mers,etc. This is because there can be no internal occurrences of thecleavage nucleotide in any cleavage fragment. That is, if G is thecleavage nucleotide, then all resulting cleavage fragments will have 0or 1 G, depending on the cleavage mechanism and, if it is 1 G, that Gmust occur at either the 3′ or the 5′ end of the fragment depending onthe cleavage mechanism. Put another way, there cannot be a G internal toa fragment because, if there were, that fragment would necessarily bere-fragmented at the internal G. Thus, if the cleavage chemistry doesleave a G on either end of all G-cleavage fragments, then the mass of Gcan be subtracted from the mass of each fragment and the resultingmasses can be compared. The same can be done with A, C and T. Table 4shows the masses of all 2 mers through 7 mers lacking one nucleotide.This calculation has been performed for polynucleotides up to 30 mersand it has been shown that there are only 8 sets of isobaricoligonucleotides (oligonucleotides with masses within 0.01% of eachother) below a mass of 5000 Da. The eight sets of isobaricoligonucleotides are shown in Table 3B. Inspection of Table 3B revealsthat every set except Set 2 involves a polynucleotide with multiple Gresidues. Thus, cleavage at G would eliminate all isobaric masses exceptone, d(T₈) vs. d(C₃A₅), which could not be resolved by mass spectrometrywith a resolution of 0.01%. However, either C or A cleavage would removethe latter polynucleotide.

Table 4 shows that cleavage at A or T consistently produces fragmentswith larger mass differences between the closest possible cleavagefragments. Cleavage, at A produces mass differences of 5, 10, 15, 20 or25 Da between the closest fragments while cleavage at T affords massdifferences of 8, 18 or 24 Da, albeit at the expense of a few moreisobaric fragments. TABLE 2 Panel A dAMP DCMP dGMP dTMP Mol. wt. 313.2289.2 329.2 304.2 vs. dAMP — 24 16 9 vs. dCMP — 40 15 vs. dGMP — 25Panel B dAMP DCMP dGMP dTMP 2-chloroadenineMP Mol. wt. 313.2 289.2 329.2304.2 347.7 vs. dTMP 42.3 vs. dAMP — 24 16 9 — vs. dCMP — 40 15 57.3 vs.dGMP — 25 17.3

In Table 2, Panel A shows the masses of the four deoxynucleotideresidues are shown across the top, and calculated molecular weightdifferences between each pair of nucleotide residues are shown in thetable. Note that chemically modified nucleotides will generally havemasses different than those shown above for the natural nucleotides. Themass difference between a particular modified nucleotide and the othernucleotides will vary depending on the modification. See description ofspecific nucleotide modifications and cleavage mechanisms for details ofcleavage products.

Panel B shows that the mass differences between the natural nucleotidesand 2-chloroadenine are shown (far right column). The smallest massdifference is 17.3 Da instead of 9 Da as in panel A, providingadvantageous discrimination of nucleotides using mass spectrometry.Thus, for a given target analyte polynucleotide, if its sequence isknown, it is possible to determine whether cleavage at one or more ofthe base nucleotides would produce any of the above confoundingartifacts and then, by judicious choice of experimental conditions, itis possible to avoid or resolve them.

Based on the preceding analysis, it can be seen that any difference inthe nucleotide sequence among two or more similar polynucleotides fromdifferent members of a population will result in a difference in thepattern of fragments obtained by cleavage of the polynucleotides andthus a difference in the masses seen in the mass spectrogram. Everyvariance will result in two mass changes, the disappearance of a massand the appearance of a new mass. In addition, if a double-strandedpolynucleotide is being analyzed or if two strands are being analyzedindependently, the variance will result in a change in mass of the twocomplementary strands of a target DNA resulting in four mass changesaltogether (a mass disappearance and a mass appearance in each strand).The presence of a second strand displaying mass changes provides auseful internal corroboration of the presence of a variance. Inaddition, the sets of mass changes in fragments from complementarystrands can provide additional information regarding the nature of thevariance. FIGS. 27-30 exemplify the detection of a mass difference onboth strands of a polynucleotide after full substitution and cleavage atmodified dA, a variant position in the transferrin receptor gene. Table5 shows the sets of mass changes expected on complementary strands forall possible point mutations (transitions and transversions). Once themass spectrogram is obtained, it will be immediately apparent whetherthe variance was an addition of one or more nucleotides to a fragment(an approximately 300+a.u. increase in fragment mass), deletion of oneor more nucleotides from a fragment (approximately a 300+a.u. decreasein fragment mass) or a substitution of one or more nucleotides for oneor more other nucleotides (differences as shown in Table 5).Furthermore, if the variance is a substitution, the exact nature of thatsubstitution can also be ascertained. TABLE 3a 2mer mass 3 mer Mass 4mermass 5mer mass CC 596 CCC 885 CCC 1174 CCCCC 1463 CT 611 CCT 900 CCCT1189 CCCCT 1478 AC 620 CCA 909 CCCA 1198 CCCCA 1487 TT 626 CTT 915 CCTT1204 CCCTT 1493 AT 635 CTA 924 CCTA 1213 CCCTA 1502 CG 636 CCG 925 CCCG1214 CCCCG 1503 AA 644 TTT 930 CTTT 1219 CCTTT 1508 GT 651 CAA 933 CCAA1222 CCCAA 1511 AG 660 TTA 939 CTTA 1228 CCTTA 1517 GG 676 CTG 940 CCTG1229 CCCTG 1518 TAA 948 TTTT 1234 CTTTT 1523 CGA 949 CAAT 1237 CCTAA1526 TTG 955 CCAG 1238 CCCGA 1527 AAA 957 TTTA 1243 CTTTA 1532 TGA 964CTTG 1244 CCTTG 1533 CGG 965 CAAA 1246 CCAAA 1535 AAG 973 TTAA 1252TTTTT 1538 TGG 980 CTAG 1253 CTTAA 1541 GGA 989 CCGG 1254 CCTGA 1542 GGG1005 TTTG 1259 CCCGG 1543 TAAA 1261 TTTTA 1547 CAAG 1262 CTTTG 1548 TTAG1268 CAATA 1550 CTGG 1269 CCAGA 1551 AAAA 1270 TTTAA 1556 TAAG 1277CTTGA 1557 CAGG 1278 CCTGG 1558 TTGG 1284 CAAAA 1559 AAAG 1286 TTTTG1563 TAGG 1293 TTAAA 1565 CGGG 1294 CTAGA 1566 AAGG 1302 CCGGA 1567 TGGG1309 TTTGA 1572 AGGG 1318 CTTGG 1573 GGGG 1334 TAAAA 1574 CAAAG 1575TTAAG 1581 CTGGA 1582 AAAAA 1583 CCGGG 1583 TTTGG 1588 TAAAG 1590 CAAGG1591 ATTGG 1597 CTGGG 1598 AAAAG 1599 TAAGG 1606 ACGGG 1607 TTGGG 1613AAAGG 1615 ATGGG 1622 CGGGG 1623 AAGGG 1631 TGGGG 1638 AGGGG 1647 GGGGG1663Table 3a shows the masses of all possible compositions of 2mers, 3mers,4mers and 5mers in order of mass in Daltons (Da), rounded to the nearestwhole number for ease of presentation. (Other nucleotide orders arepossible for many of the oligonucleotides.)The 5mers column is continued on the left under the 2mers. Note that two5mers with different nucleotide content have the same mass (AAAAA andCCGGG, shaded at bottom right, both weigh 1504). The molecular massesare provided; ionization will change the masses.More generally, these masses are illustrative; actual masses will differdepending on the chemical modification, cleavage mechanism and polarityof ionization.

TABLE 3b Polynucleotides Masses Set 1 d (C₂G₃) 1566.016 d (A₅) 1566.068Set 2 d (C₅G₃) 2433.584 d (T₈) 2433.603 d (C₃A₅) 2433.636 Set 3 d (A₁G₇)2617.707 d (C₈T₁) 2617.711 Set 4 d (C₁₀T₁) 3196.090 d (G₁₀) 3196.137 Set5 d (C₆T₁A₄) 3292.134 d (C₁₃) 3292.190 Set 6 d (C₁₃) 3759.457 d (T₇A₁G₄)3759.472 Set 7 d (C₅T₉) 4183.751 d (A₆G₇) 4183.779 Set 8 d (T₇G₇)4433.899 d (C₁₁A₄) 4433.936

TABLE 4 (part 1) Cleavage at G Cleavage at C Cleavage at A Cleavage at T2mer mass mass Δ 2mer Mass mass Δ 2mer mass mass Δ 2mer Mass mass Δ CC517 TT 547 CC 517 CC 517 CT 532 15 AT 556 9 CT 532 15 AC 541 24 AC 541 9AA 565 9 TT 547 15 CG 557 16 TT 547 6 GT 572 7 CG 557 10 AA 565 8 AT 5569 AG 581 9 GT 572 15 AG 581 16 AA 565 9 CG 597 16 CG 597 25 CG 597 163mer mass mass Δ 3mer Mass mass Δ 3mer mass mass Δ 3mer mass mass Δ CCC806 TTT 851 CCC 806 CCC 806 CCT 821 15 TTA 860 9 CCT 821 15 CCA 830 24CCA 830 9 TAA 869 9 CTT 836 15 CCG 846 16 CTT 836 6 TTG 876 7 CCG 846 10CAA 854 8 CTA 845 9 AAA 878 2 TTT 851 5 CGA 870 16 TTT 851 6 TGA 885 7CTG 861 10 AAA 878 8 CAA 854 3 AAG 894 9 TTG 876 15 CGG 886 8 TTA 860 6TGG 901 7 CGG 886 10 AAG 894 8 TAA 869 9 GGA 910 9 TGG 901 15 GGA 910 16AAA 878 9 GGG 926 16 GGG 926 25 GGG 926 16 4mer mass mass Δ 4mer massmass Δ 4mer mass mass Δ 4mer mass mass Δ CCCC 1095 TTTT 1155 CCCC 1095CCCC 1095 CCCT 1110 15 TTTA 1164 9 CCCT 1110 15 CCCA 1119 24 CCCA 1119 9TTAA 1173 9 CCTT 1125 15 CCCG 1135 16 CCTT 1125 6 TTTG 1180 7 CCCG 113510 CCAA 1143 8 CCTA 1134 9 TAAA 1182 2 CTTT 1140 5 CCAG 1159 16 CTTT1140 6 TTAG 1189 7 CCTG 1150 10 CAAA 1167 8 CCAA 1143 3 AAAA 1191 2 TTTT1155 5 CCGG 1175 8 CTTA 1149 6 TAAG 1198 7 CTTG 1165 10 CAAG 1183 8 TTTT1155 6 TTGG 1205 7 CCGG 1175 10 AAAA 1191 8 CAAT 1158 3 AAAG 1207 2 TTTG1180 5 CAGG 1199 8 TTTA 1164 6 TAGG 1214 7 CTGG 1190 10 AAAG 1207 8 CAAA1167 3 AAGG 1223 9 TTGG 1205 15 CGGG 1215 8 TTAA 1173 6 TGGG 1230 7 CGGG1215 10 AAGG 1223 8 TAAA 1182 9 AGGG 1239 9 TGGG 1230 15 AGGG 1239 16AAAA 1191 9 GGGG 1255 16 GGGG 1255 25 GGGG 1255 16 5mer mass mass Δ 5mermass mass Δ 5mer mass mass Δ 5mer mass mass Δ CCCCC 1384 TTTTT 1459CCCCC 1384 CCCCC 1384 CCCCT 1399 15 TTTTA 1468 9 CCCCT 1399 15 CCCCA1408 24 CCCCA 1408 9 TTTAA 1477 9 CCCTT 1414 15 CCCCG 1424 16 CCCTT 14146 TTTTG 1484 7 CCCCG 1424 10 CCCAA 1432 8 CCCTA 1423 9 TTAAA 1486 2CCTTT 1429 5 CCCGA 1448 16 CCTTT 1429 6 TTTGA 1493 7 CCCTG 1439 10 CCAAA1456 8 CCCAA 1432 3 TAAAA 1495 2 CTTTT 1444 5 CCCGG 1464 8 CCTTA 1438 6TTAAG 1502 7 CCTTG 1454 10 CCAGA 1472 8 CTTTT 1444 6 AAAAA 1504 2 TTTTT1459 5 CAAAA 1480 8 CCTAA 1447 3 TTTGG 1509 5 CCCGG 1464 5 CCGGA 1488 8CTTTA 1453 6 TAAAG 1511 2 CTTTG 1469 5 CAAAG 1496 8 CCAAA 1456 3 ATTGG1518 7 CCTGG 1479 10 AAAAA 1504 8 TTTTT 1459 3 AAAAG 1520 2 TTTTG 1484 5CCGGG 1504 0 CTTAA 1462 3 TAAGG 1527 7 CTTGG 1494 10 CAAGG 1512 8 TTTTA1468 6 TTGGG 1534 7 CCGGG 1504 10 AAAAG 1520 8 CAATA 1471 3 AAAGG 1536 2TTTGG 1509 5 ACGGG 1528 8 TTTAA 1477 6 ATGGG 1543 7 CTGGG 1519 10 AAAGG1536 8 CAAAA 1480 3 AAGGG 1552 9 TTGGG 1534 15 CGGGG 1544 8 TTAAA 1486 6TGGGG 1559 7 CGGGG 1544 10 AAGGG 1552 8 TAAAA 1495 9 AGGGG 1568 9 TGGGG1559 15 AGGGG 1568 16 AAAAA 1504 9 GGGGG 1584 16 GGGGG 1584 25 GGGGG1584 16 (part 2) Cleavage at G Cleavage at C Cleavage at A Cleavage at T6mer mass mass Δ 6mer mass mass Δ 6mer mass mass Δ 6mer mass mass ΔCCCCCC 1673 TTTTTT 1763 CCCCCC 1673 CCCCCC 1673 CCCCCT 1688 15 TTTTTA1772 9 CCCCCT 1688 15 CCCCCA 1697 24 CCCCCA 1697 9 TTTTAA 1781 9 CCCCTT1703 15 CCCCCG 1713 16 CCCCTT 1703 6 TTTTTG 1788 7 CCCCCG 1713 10 CCCCAA1721 8 CCCCTA 1712 9 TTTAAA 1790 2 CCCTTT 1718 5 CCCCAG 1737 16 CCCTTT1718 6 TTTTAG 1797 7 CCCCTG 1728 10 CCCAAA 1745 8 CCCCAA 1721 3 TTAAAA1799 2 CCTTTT 1733 5 CCCCGG 1753 8 CCCTTA 1727 6 TTTAAG 1806 7 CCCTTG1743 10 CCCAAG 1761 8 CCTTTT 1733 6 TAAAAA 1808 2 TTTTTC 1748 5 CCAAAA1769 8 CCCTAA 1736 3 TTTTGG 1813 5 CCCCGG 1753 5 CCCGGA 1777 8 CCTTTA1742 6 TTAAAG 1815 2 CCTTTG 1758 5 CCAAAG 1785 8 CCCAAA 1745 3 AAAAAA1817 2 TTTTTT 1763 5 CCCGGG 1793 8 TTTTTC 1748 3 TTTGGA 1822 5 CCCTGG1768 5 CAAAAA 1793 0 CCTTAA 1751 3 AAAAGT 1824 2 TTTTCG 1773 5 CCAAGG1801 8 CTTTTA 1757 6 TTAAGG 1831 7 CCTTGG 1783 10 CAAAAG 1809 8 CCAAAT1760 3 AAAAAG 1833 2 TTTTTG 1788 5 CCGGGA 1817 8 TTTTTT 1763 3 TTTGGG1838 5 CCCGGG 1793 5 AAAAAA 1817 0 CTTTAA 1766 3 AAAGGT 1840 2 TTTCGG1798 5 AAACGG 1825 8 CCAAAA 1769 3 ATTGGG 1847 7 CCTGGG 1808 10 AAAAAG1833 8 TTTTTA 1772 3 AAAAGG 1849 2 TTTTGG 1813 5 CCGGGG 1833 0 CTTAAA1775 3 TAAGGG 1856 7 TTCGGG 1823 10 AACGGG 1841 8 TTTTAA 1781 6 TTGGGG1863 7 CCGGGG 1833 10 AAAAGG 1849 8 TAAAAC 1784 3 AAAGGG 1865 2 TTTGGG1838 5 ACGGGG 1857 8 TTTAAA 1790 6 AGGGGT 1872 7 TGGGGC 1848 10 AAAGGG1865 8 CAAAAA 1793 3 AAGGGG 1881 9 TTGGGG 1863 15 GGGGGC 1873 8 TTAAAA1799 6 GGGGGT 1888 7 GGGGGC 1873 10 AAGGGG 1881 8 TAAAAA 1808 9 AGGGGG1897 9 GGGGGT 1888 15 AGGGGG 1897 16 AAAAAA 1817 9 GGGGGG 1913 16 GGGGGG1913 25 GGGGGG 1913 16 7mer mass mass Δ 7mer mass mass Δ 7mer mass massΔ 7mer mass mass Δ CCCCCCC 1962 TTTTTTT 2067 CCCCCCC 1962 CCCCCCC 1962CCCCCCT 1977 15 TTTTTTA 2076 9 CCCCCCT 1977 15 CCCCCCA 1986 24 CCCCCCA1986 9 TTTTTAA 2085 9 CCCCCTT 1992 15 CCCCCCG 2002 16 CCCCCTT 1992 6TTTTTTG 2092 7 CCCCCCG 2002 10 CCCCCAA 2010 8 CCCCCTA 2001 9 TTTTAAA2094 2 CCCCTTT 2007 5 CCCCCGA 2026 16 CCCCTTT 2007 6 TTTTTGA 2101 7CCCCCTG 2017 10 CCCCAAA 2034 8 CCCCCAA 2010 3 TTTAAAA 2103 2 CCCTTTT2022 5 CCCCCGG 2042 8 CCCCTTA 2016 6 TTTTAAG 2110 7 CCCCTTG 2032 10CCCCAAG 2050 8 CCCTTTT 2022 6 TTAAAAA 2112 2 CCTTTTT 2037 5 CCCAAAA 20588 CCCCTAA 2025 3 GGTTTTT 2117 5 CCCCCGG 2042 5 CCCCGGA 2066 8 CCCTTTA2031 6 TTTAAAG 2119 2 CCCTTTG 2047 5 CCCAAAG 2074 8 CCCCAAA 2034 3TAAAAAA 2121 2 CTTTTTT 2052 5 CCAAAAA 2082 8 CCTTTTT 2037 3 TTTTGGA 21265 CCCCTGG 2057 5 CCCCGGG 2082 0 CCCTTAA 2040 3 TTAAAGA 2128 2 CCTTTTG2062 5 CCCGGAA 2090 8 CCTTTTA 2046 6 AAAAAAA 2130 2 TTTTTTT 2067 5CCAAAAG 2098 8 CCCAAAT 2049 3 TTTGGAA 2135 5 CCCTTGG 2072 5 CCCGGGA 21068 CTTTTTT 2052 3 AAAAAGT 2137 2 CTTTTTG 2077 5 CAAAAAA 2106 0 CCTTTAA2055 3 GGGTTTT 2142 5 CCCCGGG 2082 5 CCAAAGG 2114 8 CCCAAAA 2058 3TTAAAGG 2144 2 CTTTCGG 2087 5 CAAAAAG 2122 8 TTTTTCA 2061 3 AAAAAAG 21462 GTTTTTT 2092 5 CCCGGGG 2122 0 CCTTAAA 2064 3 TTTGGGA 2151 5 CCCTGGG2097 5 CCGGGAA 2130 8 TTTTTTT 2067 3 AAAAGGT 2153 2 CTTTTGG 2102 5AAAAAAA 2130 0 TTTTAAC 2070 3 AATTGGG 2160 7 CCTTGGG 2112 10 AAAACGG2138 8 TAAAACC 2073 3 AAAAAGG 2162 2 GGTTTTT 2117 5 AAAAAAG 2146 8ATTTTTT 2076 3 GGGGTTT 2167 5 CCCGGGG 2122 5 CCGGGGA 2146 0 TTTAAAC 20793 TAAAGGG 2169 2 CTTTGGG 2127 5 AAACGGG 2154 8 CCAAAAA 2082 3 TTGGGGA2176 7 TGGGGCC 2137 10 AAAAAGG 2162 8 AATTTTT 2085 3 AAAAGGG 2178 2GGGTTTT 2142 5 CCGGGGG 2162 0 CTTAAAA 2088 3 AAGGGGT 2185 7 CTTGGGG 215210 AACGGGG 2170 8 AAATTTT 2094 6 GGGGGTT 2192 7 GGGGGCC 2162 10 AAAAGGG2178 8 CTAAAAA 2097 3 AAAGGGG 2194 2 GGGGTTT 2167 5 AGGGGGC 2188 8AAAATTT 2103 6 AGGGGGT 2201 7 GGGGGTC 2177 10 AAAGGGG 2194 8 CAAAAAA2106 3 AAGGGGG 2210 9 GGGGGTT 2192 15 CGGGGGG 2202 8 AAAAATT 2112 6GGGGGGT 2217 7 CGGGGGG 2202 10 AAGGGGG 2210 8 AAAAAAT 2121 9 AGGGGGG2226 9 GGGGGGT 2217 15 AGGGGGG 2226 16 AAAAAA 2130 9 GGGGGG 2242 16GGGGGGG 2242 25 GGGGGGG 2242 16part 1 of Table 4 shows the masses resulting from cleavage ofoligonucleotides at specific nucleotides. Cleavage at G producesfragments with no internal G residues but, depending on the cleavagemechanism, there may be a G at the 5′ or# 3′ end of a fragment. In the table, G has been omitted from the Gcleavage fragments for ease of representation (thus each fragment couldbe considered one nudeotide longer). Of course, the result is the samefor C, A or T cleavage. Nucleotide masses were rounded to the nearestwhole number. The mass of one phosphate group, 61 daltons, wassubtracted from each fragment since most cleavage reactions result inthe loss of one phosphate group.part 2 of Table 4 shows the masses resulting from cleavage ofoligonucleotides at specific nucleotides (G, C, A or T, as indicated).See legend to part 1 of this Table. Note that the two 5mers with thesame T cleavage mass (part 1) continue to propagate through the Tcleavage masses.Hybridization Methods

While the means of detection may vary for each of thehybridization-based methods discussed below, they all share the samepreliminary steps of PCR amplification of the region of DNA surroundingthe polymorphism using one or more modified, cleavable nucleotidesfollowed by chemical cleavage at the site(s) of incorporation of thecleavable nucleotide. The resulting fragments may be immobilized byusing an immobilized PCR primer, by immobilizing the fragments of thecleavage or by subsequent hybridization of the fragments with ananchored oligonucleotide. The primer or oligonucleotide may be anchoredto any type of solid support such as, without limitation, a chip, a beador a filter. Numerous such solid supports are known in the art and arewithin the scope of this invention.

Once the amplified product has been chemically cleaved and immobilized,detection of a target polymorphism can be accomplished in any number ofways. Virtually any method of detection known in the art, such asradiolabeling and fluorescence detection may be employed; ways toimplement any of these techniques will become apparent based on thedisclosures herein; all such procedures are within the scope of thisinvention. A presently preferred technique involves fluorescence, bothsingle dye and FRET.

A label may be incorporated in the amplified regions of nucleic acidsequence by using a radioactively or fluorescently labeled nucleotidethat does not interfere with the amplification reaction, cleavage orwith subsequent hybridization conditions or label detection. The labelednucleotide may be a modified, cleavable nucleotide of this invention orit may be a nucleotide that, other than being labeled, is naturallyoccurring.

A label can also be incorporated during the cleavage reaction, using,for example, a labeled secondary amine or a labeled TCEP molecule. Theuse of secondary amines is shown in FIG. 38. The use of TCEP isdescribed above and shown in Scheme 4. As shown, the product of cleavagewith TCEP and base is unique and results in a phosphate-ribose-TCEPadduct at the 3′ end of the cleavage fragment and a phosphate moiety at5 end. Thus, the use of a labeled TCEP (or other phosphine) derivatives,provides direct, unambiguous labeling of cleavage fragments.

Of course, it is possible to perform a TCEP cleavage and label thefragments afterwards, either using substituents on the TCEP moietyattached to the fragment or any of the other means described herein orknown in the art.

Incorporation of a 3′-SH modified nucleotide into the region of interestof the DNA sample surrounding the SNP would also provide a convenientlabeling site. Chemical cleavage of such a nucleotide results in theprimary SH group remaining in the sugar portion of the residue at thesite of cleavage. A primary SH group is quite reactive and can belabeled with iodoacetamides or maleimides that in'turn are radiolabeledor are substituted with fluorescent molecules.

Alternatively, one or both of the primers used in PCR amplification canbe labeled. Proper selection of the primer will result in fragmentsafter chemical cleavage that still contain the labeled primer region.

a. Detection by Differential Melting Temperature

This method of detection is shown schematically in FIG. 33. The regionsurrounding the SNP of interest is amplified by PCR using a modified,cleavable nucleotide corresponding to the SNP nucleotide. For example,if the known SNP is a dATP, then a modified dATP is used in the PCR, asshown in FIG. 33A. Modified dATP is thus incorporated at each positionthat would normally be occupied by an unmodified dATP. One of the PCRprimers is designed such that the first modified dATP residueincorporated after the primer corresponds to the SNP.

As with the other methods described herein, at some point, a detectablelabel is incorporated into the system, either by use of a labeledprimer, a labeled nucleotide, a labeled ribonucleotide, a labeled,modified nucleotide or a labeled, modified ribonucleotide. Furthermore,a label may be incorporated during the cleavage reaction using a labeledTCEP or a labeled secondary amine. Alternatively, a label may beincorporated after selective hybridization has occurred, i.e. after thetemperature has been raised to a degree whereby at least one of thefragments dissociates from the oligonucleotide probe.

The resulting PCR products are then cleaved at all points of occurrencesof the incorporated modified nucleotide. The pattern of cleavagefragments obtained from one allele will be different from those of theother allele, as shown in FIG. 33A where cleavage of the A/T alleleaffords a different pattern than cleavage of the G/C allele.

The cleavage products are hybridized to oligonucleotide probes designedto maximize the difference in hybridization signal obtained from the twodifferent alleles. For example, the probe shown in FIG. 33A consistingof the sequence 3′-XXXXXXXXGAGACACT 5′, will hybridize more stably tothe 5′-fragment from the G/C allele than to the corresponding fragmentfrom the ANT allele due to the formation of four more base-pairs. Thatis, the duplex formed by the probe and the G/C allele fragment will havea melting temperature detectably higher than the probe-A/T duplex. Foroptimal detection of single-base pair mismatches, a 10 to 10° C.difference in melting temperature is presently preferred. When thetemperature is raised above the melting temperature of afragment-oligonucleotide duplex corresponding to one of the alleles,that allele will disassociate. The remaining fragment-oligonucleotideduplexes can then be analyzed for the incorporated label that identifiedthe polymorphism.

The above procedure provides a powerful method for identifying thepresence of one SNP allele in a diploid DNA sample but it does notprovide information about the other allele, i.e. a (GIC)(G/C) homozygoteand a (GIC)(ANT) heterozygote would both produce a strong hybridizationsignal with the probe oligonucleotide, whereas an (A/T)(ANT) homozygotewould produce a weak signal. In order to obtain positive identificationof the alternate allele, the procedure shown in FIG. 33A is repeatedusing the SNP nucleotide of the other allele, in the example shown inFIG. 33A, dGTP. One of the PCR primers is again selected such that thefirst modified nucleotide incorporated following the primer correspondsto the variable site. The PCR product is then subjected to cleavage ateach occurrence of the modified nucleotide to give the set of fragmentsshown in FIG. 33B. As above, the cleavage products are hybridized to anoligonucleotide probe designed to maximize the difference inhybridization signal obtained from the two different alleles. In FIG.33B, the probe selected has the sequence 5′-XXXXXXXXXGAGATACT-3′. Here,it is the A/T-probe duplex that is more stable, that is, that will havethe higher melting point, due to the six additional base pairs formed inthe duplex. This difference in melting points can be exploited in twoways: all fragments can be annealed at a low temperatureand then thetemperature can be raised to a point above the melting point of the G/Cduplex, which will then fall apart leaving only the A/T duplex to bedetected or annealing can be performed at a temperature above themelting temperature of the G/C duplex, which then will not anneal atall.

In the above example, as shown in FIG. 33B, (A/T)(ANT) homozygotes willgive strong signal with probe 2 but not probe 1; (G/C)(G/C) homozygotesgive strong signal with probe 1 but not probe 2; and (A/T)(G/C)heterozygotes give strong signal with both probes.

It is presently preferred that the oligonucleotide probes used in theabove assays be immobilized on a solid support such as, withoutlimitation, microchips, microbeads, glass slides or any other suchmatrix, all of which are within the scope of this invention.

The PCR primer nearest to the SNP, and the probe oligonucleotide, areboth designed to maximize the difference in the number of paired basesin the DNA duplexes formed between the probe and each of the two SNPalleles. Depending on the fragment patterns produced after chemicalcleavage, the capture probe oligonucleotide may completely overlap withthe 5′ primer, as in FIGS. 33A, B and C or may partially overlap asillustrated in FIG. 34.

Alternatively, the capture probe may be designed to hybridize to aninternal fragment, rather than the 5′ fragment as shown in FIG. 35.

In any of the procedures herein that involve label incorporation duringPCR, other than by means of labeled primers, incorporation will takeplace in the 5′ to 3′ direction as well as the 3′ to 5′ direction. Ifthe subsequent cleavage reaction does not result in fragments smallenough or not hybridizable to the fragment containing the site ofpolymorphism, i.e. the identification fragment, some sample clean upwill be required. Sample clean-up methods to remove potential labeledfragments interfering with label detection includes but are not excludedto specific hybridization to an oligonucleotide polynucleotide sequenceon a solid support, filtration, or slab gel electrophoresis withdetection of the separable hybridized duplexes, structures, or bandsusing a fluorimeter or other detection device.

b. Detection Based on Incorporation of Modified Nucleotides

1. Modified Nucleotide/Labeled Nucleotide Method

The region surrounding the SNP of interest is amplified by PCR, in thepresence of a modified nucleotide and a labeled nucleotide (for exampleGm and A* in FIG. 36). Cleavage of the PCR amplification products at thesites of modified nucleotide incorporation results in fragments whosesize is dependent on the presence or absence of an allele of the SNP asshown in FIG. 36. There, modified dGTP is added to the PCR reactionmixture in place of naturally occurring dGTP and is thus incorporated ateach position that would normally be occupied by an unmodified dGTP. Thelabeled nucleotide that is incorporated (dA*TP in FIG. 36) is one thatdoes not corresporid to one of the two possible alleles and that is notpresent in the sequence between the 3′ end of the primer and thelocation of the SNP nucleotide. The labeled nucleotide is not cleavableunder the cleavage conditions selected.

If incorporation of the labeled nucleotide (dA*TP) reduces the PCRamplification to an unacceptable level, the dA*TP can be mixed withunlabeled dATP to allow for adequate amplification to occur. Partialincorporation of the labeled nucleotide is sufficient to achieveacceptable signal for subsequent detection.

The resulting PCR product is then specifically cleaved at all sites ofincorporation of the modified nucleotide analog (Gm). The pattern ofcleavage fragments obtained will vary between the two alleles dependingon the nucleotide present at the SNP site. Furthermore, the fragmentassociated with the primer can either be labeled or unlabeled. In FIG.36, the fragment from G allele cleavage will have a labeled nucleotidewhereas the T allele cleavage fragment will not.

The cleavage products are hybridized to an oligonucleotide probe that isthe complement of the PCR primer associated with the SNP. The cleavageproduct from both alleles will hybridize to the oligonucleotide probe,however, only the product with the non-cleavable base at the SNP site(the T allele in FIG. 36) will afford a detectable signal.

The above procedure can be repeated to detect the T allele. This isshown in FIG. 37. Probing the sample for the G or T allele separatelyallows determination of whether a sample is homozygous G/G, homozygousT/T, or heterozygous G/T at the polymorphic site within the DNA sampleand ultimately establishes the relevant gene sequence.

2. Detection by Fluorescence Resonance Energy Transfer (FRET)

FIG. 39 depicts the determination of an SNP using chemical cleavagefollowed by FRET. First, PCR amplification using one modifiednucleotide, dAmTP is shown in FIG. 39, incorporates the modified base atall sites, including the polymorphic site in the amplified region ofDNA. The chemical cleavage reaction is then carried out in the presenceof TCEP or a secondary amine alone. The TCEP or secondary amine can betagged with a fluorescing dye prior to the cleavage reaction or the dyecan be added after cleavage is done. In FIG. 39, the F1 label is shownas being attached to both the A and G allele fragments.

The fragments carrying the dyes are then hybridized to oligonucleotideprobes that also carry dye molecules, designated F2 and F3 in FIG. 39.F2 and F3 are selected for optimal separation of the F1-F2 and F1-F3FRET emission spectra. A FRET emission will be detected only when thefluorophores are within close enough proximity. Thus, in FIG. 39, no, ora reduced, FRET emission would occur with the G allele fragment usingprobe 1 because the two fluorophores (the dye molecules are oftenreferred to as fluorophores) are not sufficiently close to one anotherfor efficient energy transfer. Similarly, the A allele fragment is notdetected using probe 2, because the F1 and F3 fluorophores are distantfrom each other. Conversely, the A allele/probe 1 and the G allele/probe2 duplexes would result in detectable FRET signals because the twofluorophores are in close proximity to one another. Fluorophores F2 andF3 may be the same or different molecules.

Alternatively, if the donor and acceptor molecules are within FRETdistance from one another, differential emission patterns may be used toidentify the oligonucleotide probe/fragment duplexes. That is, samplesmay be irradiated at the donor F1 excitation wavelength and the emissionwavelength of F2 or F3 fluorescence may be observed. In this manner, thefour possible duplexes representing heterozygous alleles within the samesample may be identified. For example, the FRET detection of fragmentsdepicted in FIG. 39 would be as follows: Probe 1 Probe 2 DifferentialDifferential Signal Emission Signal Emission Allele Quench PatternsQuench Patterns GG Signal Donor Signal Quench Acceptor GA PartialDonor/Acceptor Partial Signal Donor/ Signal Acceptor AA Signal AcceptorSignal Donor quenchC. Detection Based on Incorporation of Modified Ribonucleotides

Some of the chemical cleavage reactions disclosed herein including, butnot limited to, 7-NO₂-dA, 7-NO₂-dG, oxidized 5-OH-dC or 5-OH-dU, occurthrough ring-opening followed by loss of the incorporated modified base.In these cases, if a label were attached to the base, the fragment to beidentified would lose the label during the reaction and thus would notbe detectable (as in FIG. 38 for example).

In the cases of ribonucleotide cleavage, strand scission occurs withretention of the ribonucleotide at the 5′ end of the DNA fragments.Thus, using modified ribonucleotides has the advantage of labeling thepolymorphism containing fragment and, if desired, nearest to one of thePCR primers. However, incorporation of ribonucleotides in reactions toamplify DNA may require the use of a polymerase having reduceddiscrimination between deoxy- and ribonucleotides. Polymeraseincorporation of ribonucleotides is discussed under “C. ModifiedNucleotide Incorporation” above and below in Example 1.

FIG. 40 demonstrates one approach to detecting polymorphisms byincorporation of labeled ribonucleotides in a DNA segment. First, PCRamplification of the region of DNA surrounding the single nucleotidepolymorphism is performed in the presence of two labeledribonucleotides, F1-rATP and F2-rGTP. In this example, F1 and F2 aredifferent labels and thus can be differentially detected. In the exampleshown in the figure, there is an A or G polymorphism, which occursdownstream from primer 1. The amplified DNA segment incorporating thelabeled F1-rATP and F2-rGTP is subjected to chemical cleavage at thesite of incorporation of the labeled ribonucleotides to produce labeledfragments. The labeled fragments are identified in FIG. 40 as Aallele-F1 and G allele-F2. The fragments are then contacted with anoligonucleotide probe under conditions amenable to hybridization.Depending on the different detectable labels, the presence of the Aallele or G allele may be identified in the DNA sample. Further, asample that has both types of alleles may appear as a hybrid signal.

An alternative method is to immobilize one primer, which is preferablyin close proximity to the site of polymorphism, on a solid support suchthat the amplified DNA segment is likewise immobilized. In this way,after chemical cleavage the desired labeled fragment would remainattached to the solid support. This approach is shown in FIG. 41. FIG.41 employs the same general procedure shown in FIG. 40. However,however, immobilization of the 5′ or 3′ primer to a solid support beforeor after the PCR reaction may be useful for any of the hybridizationspecific methods described above; all such approaches are within thescope of this invention.

ii. Intramolecular Methods for the Detection of Single NucleotidePolymorphisms.

a. Methods Based on Multiple Labeled Nucleotides

In this method, a region surrounding the site of polymorphism isamplified in the presence of a cleavable nucleotide and two fluorescentdye containing nucleotides (for example, A* and C* in FIG. 42). The PCRamplification reaction is designed such that the amplified regioncontains one labeled nucleotide 5′ (A*) and 3′ (C*) to the site ofpolymorphism. The A* and C* labeled nucleotides have differentialfluorescent emission wavelengths and thus will be differentiallydetectable. Further, in fragments in which both labels are incorporated,the different emission wavelengths can be used to detect theincorporation of the labels within the same sample. Detection of signalquenching may be used rather than emission detection to identify theallelic differences.

To initiate this approach, PCR amplification of the region surroundingthe site of polymorphism is conducted in the presence of one modifiedcleavable nucleotide which is either of the two nucleotides identifiedat the site of polymorphism (dGmTP in FIG. 43A) and two differentfluorescent dye-containing nucleotides. Complete substitution of themodified nucleotide is required, while only partial substitution of thetwo fluorescently labeled nucleotides may be necessary to ensureadequate detection of the resulting amplified product. However, completesubstitution of the fluorescent nucleotides for the naturally occurringnucleotides is preferred.

The fragments resulting from the chemical cleavage reactions may requiresome clean up. For example, FIG. 43A, the TTA* fragment that retains thelabel may interfere with the emission wavelength detection of the labelon the fragment containing the polymorphic site. This sample cleanup maybe accomplished by filtration or slab gel electrophoresis prior tohybridization of the polymorphic site containing fragment to animmobilized oligonucleotide or by washing after hybridization. If FRETdetection is used, cleanup may not be necessary since the TTA labeledfragment will not be in close enough proximity to another dye containingnucleotide FRET to occur and the only detectable wavelength attributableto the TTA* labeled fragments will most likely be the emissionwavelength of the A incorporated label.

Detection using FRET analysis of the resultant fragments should resultin a quantitative difference due to the different labels on the twonucleotides. That is, as depicted in FIG. 42, a GG homozygote would havedetectable emission wavelengths different from the M homozygote. Theheterozygote GC may be quantitatively different (rather thanqualitatively) than the homozygote emission patterns.

An alternative approach to this method is to use a 5′ primer during thePCR reaction that has an incorporated label. In this way, the amplifiedpolynucleotide sequence would have one label associated with the 5′primer sequence and only one label that would be uniformly incorporatedduring the PCR reaction. This method may limit undesirable fragmentinterference and may obviate sample fragment clean up or separation.

b. Methods Based on Generation of Hair-Pin Loops

In the four methods described below, detection of single nucleotidepolymorphisms involves chemical cleavage reactions followed by hair-pinduplex formation. For ease of detection in each of these methods, afluorescent label must be attached to the fragment containing thepolymorphic site. As was described above, this can be accomplished byusing labeled TCEP or a secondary amine during the cleavage reaction orusing a labeled ribonucleotide during PCR amplification.

In the design of hairpin loop formation for subsequent detection byFRET, criteria for optimal stability of the loop structure includeminimization of the flank regions and loop base number as well asmaximization of the stem region Watson-Crick interactions. Furthermore,stability within the loop may entail base stacking interactions. Inaddition, the effects of hair-pin loop formation on PCR amplificationmust be considered. That is, PCR amplification is best performed onlinearized sequences. Thus, stability of the hair-pin loop structuresmust further include consideration of ease of linearization for adequateand precise amplification to occur.

1. In the first method, as shown in FIG. 43A, a primer is designed toform a duplex with the 3′ primer end amplified region of DNA. Afluorescent label is attached to this primer's 5′ end (G*) and amodified nucleotide (dGmTP) is substituted to at all occurrences of thenatural nucleotidein the amplified region of DNA, including thepolymorphic site (the G/A in FIG. 43A). Alternatively, as notedpreviously, a labeled modified or unmodified ribonucleotide may be used.

The resultant PCR segments are subjected to chemical cleavageconditions, which may include a labeled TCEP or other secondary amine,followed by incubation under conditions that allow and enhance thestability of hair-pin loop structures. These hair-pin loop structuresbring in close proximity the incorporated fluorescent label at the 3′end (either via incorporation of a labeled ribonucleotide or by alabeled TCEP or secondary amine) and the 5′ fluorescent label attachedto the primer. For signal quenching detection, the donor labels in closeproximity to the acceptor molecules will undergo wavelength emissionquenching. Thus, in the detection of the presence or absence of thepolymorphism, the GG homozygote would result in a quenched signal, theAA homozygote would result in a detectable signal, and the GAheterozygote would result in an intermediate or partial signal, asdepicted in the inset.

In cases where differential wavelengths are being detected, the GGhomozygote will emit a detectable acceptor emission wavelength, the AAallele will emit a detectable donor emission wavelength and a GAheterozygote will emit both donor and acceptor emission wavelength, asshown in the inset of FIG. 43A.

2. In cases where a less than optimal signal is obtained, inclusion of adifferent modified nucleotide at the site of polymorphism may beundertaken. For example, in FIG. 43A, the polymorphism is a G/A. If theabove method was employed and the heterozygote samples wereunidentifiable over homozygote samples, the above method could berepeated using a modified adenine nucleotide (or ribonucleotide). Asshown in FIG. 43B, a modified A nucleotide and a similar primer having alabel on its 5′ end could be used. The results of this second reaction(as shown in the inset) could confirm the results of the first reactionas described above.

3. An alternative to the above methods includes using two differentprimers. As in the previous two methods, either a labeled ribonucleotideincorporated during PCR or a labeled TCEP or secondary amineincorporated during the chemical cleavage reaction would be used tolabel the 3′ end of the resultant fragment. The first primer has anextended region at the 5′ end that is labeled, and is designed such thatit can form a duplex with the amplified region beginning with the siteof incorporation of the first non-polymorphic modified nucleotide. Thesecond primer has a shorter 5′region, however it too can form a duplexwith the amplified region of DNA beginning with the site ofpolymorphism. The ensuing hair-pin structure would bring the label inclose proximity to the 3′ end of the fragment and a FRET emission orquenching will be observed. In the example shown in FIG. 43C, a singlemodified nucleotide is used during the PCR reaction. After chemicalcleavage in the presence of a labeled TCEP or other secondary amine andconditions for optimal hair-pin loop formation, the detectable signalsthat would be obtained are shown in the following table. Where FRETquenching is detected, only the GA heterozygote will have anintermediate signal, whereas the GG will be quenched in the samplesusing the shorter primer and not detectable in samples using the longerprimer. Conversely, the AA fragments will have a detectable signal inthe sample fragments from the shorter primer amplicons and no detectablesignal in the samples using the long primer. Where differential emissionpatterns are being detectable, only the heterozygote will emit bothdonor and acceptor wavelengths, whereas the homozygote samples will emiteither donor or acceptor wavelengths. Short Primer Long PrimerDifferential Differential Signal Emission Signal Emission Allele QuenchPatterns Quench Patterns GG Signal Acceptor Signal Donor quench GAPartial Donor/Acceptor Partial Signal Donor/ Signal Acceptor AA SignalDonor Signal Quench Acceptor

4. Another approach to hair-pin loop design for detection of singlenucleotide polymorphisms is shown in FIG. 44. In FIG. 44, a PCR primeris designed so that the 5′ end contains a fluorescent label and has theability to form a hair pin loop structure (AAAA with TTTT). The 3′ end,after extension in the amplification reaction, is able to form a duplexwith an internal region of the primer. After amplification of the regionsurrounding the single nucleotide polymorphism (G/A in FIG. 44) in thepresence of a modified nucleotide to'completely substitute the cleavablenucleotide at the site of polymorphism, the resultant amplified productsare subjected to chemical cleavage. As previously described, thecleavage may include labeled TCEP or labeled secondary amine or alabeled ribonucleotide or modified ribonucleotide may be used during thePCR amplification reaction.

After complete cleavage, with possible TCEP or secondary amine labeling,the polymorphic site fragment is allowed to form a duplex complex asshown in FIG. 44. The fragments are then incubated under conditionsselected to encourage the portion of the amplified region to interactwith and form a duplex with the portion of the primer region therebyenhancing cooperativity of base pair stacking interactions. In otherwords, to keep the TCEP-adduct label or ribonucleotide label in closeproximity to the hair-pin stabilized label at the 5′ end of the primer.Where there is a GG homozygote, the signal will be quenched, however, inAA homozygotes, fluorescence will be detectable. Furthermore, GAhomozygotes will display an intermediate signal, as shown in the inset.

E. Serial Cleavage

The preceding discussion focuses primarily on the use of one cleavagereaction with any given modified polynucleotide. However, it is alsopossible and it is a further aspect of this invention, to seriallycleave a polynucleotide in which two or more natural nucleotides havebeen replaced with two or more modified nucleotides, which havedifferent cleavage characteristics. That is, a polynucleotide thatcontains two or more types of modified nucleotides, either fully orpartially substituted, can be cleaved by serial exposure to differentcleavage conditions, either chemical, physical or both. One preferredembodiment of this approach is tandem mass spectrometry, wherefragmented molecular species produced by one procedure can be retainedin a suitable mass spectrometer (e.g. Fourier-transform ion cyclotronresonance mass spectrometer or ion trap mass spectrometer), forsubsequent exposure to a second physical/chemical procedure that resultsin activation and cleavage at a second modified nucleotide. The productions may be subjected to a third and even a fourth cleavage conditiondirected to specific modifications on a third and fourth nucleotide toenable observation of precursor-product relationships between the input(precursor) ions and those generated during each round of cleavage. Theuse of a continuous or stepwise gradient of cleavage conditions ofincreasing efficiency may be used to enhance the elucidation ofprecursor-product relationships between ions.

The production of a polynucleotide containing multiple modifiednucleotides reduces the need to perform multiple polymerizations on thesame template to produce a set of polynucleotides each with a differentsingle modified nucleotide; i.e., one for cleavage at A, one for G, onefor T and one for C. Also, the serial application of cleavage proceduresspecific for different nucleotides of a single polynucleotide enhancesdetection of precursor-product relationships, which is useful fordetermining DNA sequence. FIG. 21 shows the production of apolynucleotide modified by complete substitution of riboGTP for dGTP and5′-amino-TTP for dTTP followed by cleavage with base, which results incleavage at G, or cleavage with acid, which results in cleavage at T.Subsequent treatment of the base cleaved fragments with acid orvisa-versa results in further fragmentation into double (G and T)cleaved fragments. This would be useful, for example and withoutlimitation, for identifying a variance at position 27 (dA) of thesequence (FIG. 21). That is, as can be seen in FIG. 21, cleavage at Galone produces the fragment ACTTCACCG (position 27 is highlighted),which contains two dA residues. A change in mass of this fragment of −24Da, indicating an A to C change, would not permit determination of whichof the two dA residues changed to dC. Similarly, cleavage at T alone togive the fragment TCACCGGCACCA, which contains three dA residues alsoprevents determination of which dA was changed. However, double cleavageat G and T produces the fragment TCACCG, which undergoes the −24 Da massshift and, because it only contains one dA, allows definitive assignmentof the variant nucleotide. Schemes using this approach to preciselydetect variances at other nucleotides will be apparent to those skilledin the art based on the disclosures herein and are within the scope andspirit of this invention.

A further aspect of this invention is an algorithm or algorithms, whichpermit the use of computers to directly infer DNA sequence or thepresence of variances from mass spectrometry.

F. Parallel Cleavage

It is likewise possible, and it is a further aspect of this inventionthat a polynucleotide, which has been substituted with two or moremodified nucleotides each if which is susceptible to a differentcleavage procedure, may be analyzed in parallel fashion. That is, onecan divide the polynucleotide into aliquots and expose each aliquot to acleavage procedure specific for one of the modified nucleotides. Thissaves the effort of performing independent polymerization reactions foreach of the modified nucleotides. This approach can be used to generatesequence ladders, or to generate complete cleavage products for variancedetection. As reviewed in Example 5, complete cleavage at two differentnucleotides (performed independently), followed by mass spectrometry,substantially increases the efficiency of variance detection compared tocleavage at a single nucleotide.

For example, consider a single polynucleotide substituted with ribo-A,5′-amino-C, and 5′-(bridging) thio-G nucleotides. All three modifiednucleotides are known to be incorporated by polymerases. Sequenceladders can be produced from such a modified polynucleotide by exposureof one aliquot to acid, resulting in cleavage at C; exposure of a secondaliquot to base, resulting in cleavage at A; and exposure of a thirdaliquot to silver or mercury salts, resulting in cleavage at G. It ispossible that a polynucleotide produced with the three above modifiednucleotides plus 4′-C-acyl T could also (separately) be exposed to UVlight to produce cleavage at T, resulting in a complete set ofsequencing reactions from a single polymerization product.

G. Combination of Modified Nucleotide Cleavage and Chain Termination

Another application of modified nucleotide incorporation and cleavage isto combine it with a chain termination procedure. By incorporating oneor more modified nucleotides in a polymerization procedure (for examplebut without limitation, modified A) with a different chain terminatingnucleotide, such as a dideoxy-G, a Sanger-type ladder of fragmentsterminating at the dideoxy-nucleotide can be generated. Subsequentexposure of this ladder of fragments to a chemical that cleaves at themodified A will result in further fragmentation, with the resultingfragments terminating 5′ to A and 3′ to either A (most of the time) or G(in one fragment per chain termination product). Comparison of theresulting fragment set with a fragment set produced solely bysubstitution and cleavage at the modified nucleotide (A) will provide aninstructive comparison: all the fragments will be the same except forthe presence of extra fragments in the chain terminating set which endat 3′ G, which, on mass spectrometric analysis would provide the mass(and by inference the nucleotide content) of all fragments in which an Ais followed (directly or after some interval) by a G, without anintervening A. Derivation of similar data using other chain terminatingnucleotides and other cleavage nucleotides will cumulatively provide aset of data useful for determining the sequence of the polymerizationproducts.

H. Cleavage Resistant Modified Nucleotide Substitution and Mass ShiftingNucleotides

The preceding embodiments of this invention relate primarily to thesubstitution into a polynucleotide of one or more modified nucleotideswhich have the effect of enhancing the susceptibility of thepolynucleotide to cleavage at the site(s) of incorporation of themodified nucleotide(s) in comparison to unmodified nucleotides. It isentirely possible, however, and it is yet another aspect of thisinvention, that a modified nucleotide which, when incorporated into apolynucleotide, reduces susceptibility to cleavage at the site ofincorporation of the modified nucleotide compared to unmodified sites.In this scenario, cleavage would then occur at unmodified sites in thepolynucleotide. Alternatively, a combination of cleavage-resistant andcleavage-sensitive modified nucleotides may be incorporated into thesame polynucleotide to optimize the differential between cleavable andnon-cleavable sites.

An example of a modified nucleotide which imparts this type ofresistance to cleavage is the 2′-fluoro derivative of any naturalnucleotide. The 2′-fluoro derivative has been shown to be substantiallyless susceptible to fragmentation in a mass spectrometer thanunsubstituted natural nucleotides.

As shown in Table 2, the mass differences between the naturallyoccurring nucleotides range from 9 to 40 Da and are sufficient forresolving single nucleotide differences in all fragments of 25 mer sizeand under. However, it may be desirable to increase the mass differencebetween the four nucleotides or between any pair of nucleotides tosimply their detection by mass spectrometry. This is illustrated for dAand its 2-chloroadenine analog in Table 2. That is, substitution with2-chloroadenine, mass 347.7, increases the A-T mass difference from 9 Dato 42.3 Da, the A-C difference from 24 to 57.3 Da and the A-G differencefrom 16 to 17.3 Da. Other mass-shifting nucleotide analogs are known inthe art and it is an aspect of this invention that they may be used toadvantage with the mass spectrometric methods of this invention.

I. Applications

A number of applications of the methods of the present invention aredescribed below. It is understood that these descriptions are exemplaryonly and are not intended to be nor are they to be construed as beinglimiting on the scope of this invention in any manner whatsoever. Thus,other applications of the methods described herein will become apparentto those skilled in the art based on the disclosures herein; suchapplications are within the scope and spirit of this invention.

a. Full Substitution, Full Extension and Complete Cleavage.

In one aspect of the present invention at least one of the fournucleotides of which the target polynucleotide is composed is completelyreplaced with a modified polynucleotide (either on one strand usingprimer extension, or on both strands using a DNA amplificationprocedure), a full length polynucleotide is made and substantiallycomplete cleavage is effected. The result will be cleavage of modifiedpolynucleotides into fragments averaging four nucleotides in length.This is so because the abundance of A, T, G and C nucleotides is roughlyequal in most genomes and their distribution is semi-random. Therefore aparticular nucleotide occurs approximately once every four nucleotidesin a natural polynucleotide sequence. There will, of course, be adistribution of sizes, with considerable deviation from the average sizedue to the non-random nature of the sequence of biologicalpolynucleotides, and the unequal amounts of A:T vs. G:C base pairs indifferent genomes. The extended primer (whether primer extension oramplification) will not be cleaved until the first occurrence of amodified nucleotide after the end of the primer, resulting in fragmentsof greater than 15 nt (i.e., greater than the length of the primer).Often, these primer-containing fragments will be the largest or amongthe largest produced. This can be advantageous in the design ofgenotyping assays. That is, primers can be designed so that the firstoccurrence of a polymorphic nucleotide position is after the primer.After cleavage, the genotype can be determined from the length of theprimer-containing fragment. This is illustrated in FIGS. 27-32. Due tothis variation in the size of analyte masses it is essential that themass spectrometer be capable of detecting polynucleotides ranging up to20 mers, or even 30 mers, with a level of resolution and mass accuracyconsistent with unambiguous determination of the nucleotide content ofeach mass. As discussed below, this requirement has differentimplications depending on whether the nucleotide sequence of the analytepolynucleotide is already known (as will generally be the case withvariance detection or genotyping) or not (as will be the case with denovo DNA sequencing).

i. Applications to Variance Detection

Variance detection is usually performed on an analyte DNA or cDNAsequence for which at least one reference sequence is available. Theconcern of variance detection is to examine a set of correspondingsequences from different individuals (sample sequences) in order toidentify sequence differences between the reference and sample sequencesor among the sample sequences. Such sequence variances will beidentified and characterized by the existence of different masses amongthe cleaved sample polynucleotides.

Depending on the scope of the variance detection procedure, analytefragments of different lengths may be optimal. For genotyping, it isdesirable that one primer be close to the known variant site.

Generally an analyte fragment of at least 50 nucleotides, morepreferably at least 100 nucleotides and still more preferably at least200 nucleotides will be produced by polymerase incorporation of modifiednucleotides (either A, G, C or T), followed by cleavage at the sites ofmodified nucleotide incorporation, and mass spectrometric analysis ofthe resulting products. Given the frequency of nucleotide variances(estimated at one in 200 to one in 1000 nucleotides in the humangenome), there will generally be zero or only one or two cleavagefragments that differ among any two samples. The fragments that differamong the samples may range in size from a monomer to a 10 mer, lessfrequently up to a 20 mer or, rarely, a fragment of even greater length;however, as noted above, the average cleavage fragment will be about 4nucleotides. Knowledge of the reference sequence can be used to avoidcleavage schemes that would generate very large cleavage products, andmore generally to enhance the detectability of any sequence variationthat may exist among the samples by computing the efficiency of variancedetection at each nucleotide position for all possible cleavage schemes,as outlined below. However, large sequences are not really a problemwhen a reference sequence is available and the analyte fragment lengthis only several hundred nucleotides. This is because it is extremelyunlikely that any analyte fragment will contain two large cleavagemasses that are close in size. In general, if there are only a few largefragments they can be easily identified and, as Table 5 shows, even witha MALDI instrument capable of mass resolution of only 1000, the mostdifficult substitution, an A<->T change resulting in a 9 amu shift canbe detected in a 27 mer. TABLE 5 Resolving Power of MS Instrument (FWHM)1,000 11,500 2,000 10,000 Nucleotide Maximum substitution Δ (Da)fragment in which Δ at left is resolvable C <-> G 40 123 nt 184 nt 246nt 1,230 G <-> T 25  77 nt 116 nt 154 nt 770 A <-> C 24  74 nt 111 nt148 nt 740 A <-> G 16  49 nt  74 nt  98 nt 490 C <-> T 15  46 nt  69 nt 92 nt 460 A <-> T  9  27 nt  41 nt  55 nt 270

Table 5 summarizes the relation between mass spectrometer resolution andnucleotide changes in determining the maximum size fragment in which agiven base change can be identified. The maximum size DNA fragment (innucleotides; nt) in which a base substitution can theoretically beresolved is provided in the four columns at right (bottom 6 rows) foreach possible nucleotide substitution, listed in column at left. As isevident from the table, the mass difference created by each substitution(A, measured in Daltons) and the resolving power of the massspectrometer determine the size limit of fragments that can besuccessfully analyzed. Commercially available MALDI instruments canresolve between 1 part in 1,000 to 1 part in 5,000 (FWHM) whileavailable ESI instruments can resolve 1 part in 10,000. Modified ESI MSinstruments are capable of at least 10-fold greater mass resolution.(The theoretical resolution numbers in the table do not take intoconsideration limitations on actual resolution imposed by the isotopicheterogeneity of molecular species and the technical difficulty ofefficiently obtaining large ions.) FWHM: full width at half-maximalheight, is a standard measure of mass resolution. (For furtherinformation on resolution and mass accuracy in MS see, for example:Siuzdak, G. Mass Spectrometry for Biotechnology, Academic Press, SanDiego, 1996.)

In order to select experimental conditions for variance detection thatmaximize the likelihood of success, one can use the reference sequenceto predict the fragments that would be produced by cleavage at A, G, Cor T in advance of experimental work. Based on such an analysis, theoptimal modified nucleotide substitution and cleavage scheme can beselected for each DNA or cDNA sequence that is to be analyzed. Such ananalysis can be performed as follows:

-   -   For each nucleotide of the test polynucleotide, substitute each        of the three other possible nucleotides and generate an        associated mass change. For example, if at position 1 the test        polynucleotide begins with A, then generate hypothetical        polynucleotides beginning with T, G and C. Next move to position        two of the test sequence and again make all three possible        substitutions, and so forth for all positions of the test        polynucleotide. If the test polynucleotide is 100 nucleotides in        length then altogether 300 new hypothetical fragments will be        generated by this procedure on one strand and another 300 on the        complementary strand. Each set of three substitutions can then        be analyzed together.    -   Generate the masses that would be produced by cleaving at T, C,        G or A each of the three new hypothetical test fragments        obtained by the substitutions of T, C or G for A at position 1.        Compare these mass sets with the set of masses obtained from the        reference sequence (which in our example has A at position 1).        For each of the four cleavages (T, C, G, A), determine whether        the disappearance of an existing mass or the generation of a new        mass would create a difference in the total set of masses. If a        difference is created, determine whether it is a single        difference or two differences (i.e. a disappearance of one mass        and an appearance of another). Also determine the magnitude of        the mass difference compared to the set of masses generated by        cleavage of the reference sequence. Perform this same analysis        for each of the 100 positions of the test sequence, in each case        examining the consequences of each of the four possible        base-specific cleavages, i.e., for DNA, at A, C, G and T.    -   Generate a correlation score for each of the four possible        base-specific cleavages. The correlation score increases in        proportion to the fraction of the 300 possible deviations from        the reference sequence that produce one or more mass changes        (i.e., a higher correlation score for two mass differences), and        in proportion to the extent of the mass differences (greater        mass differences score higher than small ones).    -   In the case of primer extension, the analysis is performed for        one strand; in the case of amplification, the computation is        carried out on the products of cleavage of both strands.

The above method can be extended to the use of combinations ofsubstitution and cleavage. For example, T cleavage on each of thestrands of the analyte polynucleotide (either independent orsimultaneous cleavage of both strands at T), or cleavage at T and A onone strand (again, either independent or simultaneous cleavage of bothstrands), or cleavage of one strand with T and cleavage of thecomplementary strand with A, and so forth. Based on the generatedcorrelation scores for each of the different schemes, an optimal schemecan be determined in advance of experimental work.

A computer program can be constructed to accomplish the above task. Sucha program can also be extended to encompass the analysis of experimentalcleavage masses. That is, the program can be constructed to compare allthe masses in the experimentally determined mass spectrum with thecleavage masses expected from cleavage of the reference sequence and toflag any new or missing masses. If there are new or missing masses, theexperimental set of masses can be compared with the masses generated inthe computational analysis of all the possible nucleotide substitutions,insertions or deletions associated with the experimental cleavageconditions. However, nucleotide substitutions are about ten times morecommon than insertions or deletions, so an analysis of substitutionsalone should be useful. In one embodiment, the computational analysisdata for all possible nucleotide insertions, deletions and substitutionscan be stored in a look-up table. The set of computational masses thatmatches the experimental data then provides the sequence of the newvariant sequence or, at a minimum, the restricted set of possiblesequences of the new variant sequence. (The location and chemical natureof a substitution may not be uniquely specified by one cleavageexperiment.) To resolve all ambiguity concerning the nucleotide sequenceof a variant sample may require, in some cases, another substitution andcleavage experiment (see Section E, Serial Cleavage and DNA sequencingapplications described below), or may be resolved by some othersequencing method (e.g. conventional sequencing methods or sequencing byhybridization). It may be advantageous to routinely perform multipledifferent substitution and cleavage experiments on all samples tomaximize the fraction of variances, which can be precisely assigned to aspecific nucleotide.

The inventors have performed a computational analysis of naturalpolynucleotides of 50, 100, 150, 200 and 250 nucleotides and discoveredthat combinations of two nucleotide cleavages (for example cleave at Aon one strand and G on the complementary strand) result in 99-100%detection efficiency, considering all possible substitutions up to 250nt. Potentially useful but sometimes less than 100% sensitive analysescan be performed on longer fragments up to 1000 nt. See Example 5 fordetails of this analysis.

ii. Applications to DNA Sequencing

A still further aspect of this invention utilizes the chemical methodsdisclosed herein together with mass spectrometry to determine thecomplete nucleotide sequence of a polynucleotide de novo. The procedureinvolves the same reactions described above for variance detection;i.e., total replacement of one of the four nucleotides in apolynucleotide with a modified nucleotide followed by substantiallycomplete cleavage of the modified polynucleotide at each and every pointof occurrence of the modified nucleotide and then determination of themasses of the fragments obtained. In this case, however, it may benecessary to routinely perform four sets of cleavage reactions, adifferent natural nucleotide being replaced with a modified nucleotidein each reaction so that all four natural nucleotides are in turnreplaced with modified nucleotides and the resultant modifiedpolynucleotides are cleaved and the masses of the cleavage productsdetermined. It may also be necessary to employ one or more multiplenucleotide substitutions, as discussed above, to resolve sequencingambiguities that may arise. While the number of reactions necessary persequence determination experiment is thus similar to that required forMaxam-Gilbert or Sanger sequencing, the method of this invention has theadvantages of eliminating radiolabels or dyes, providing superior speedand accuracy, permitting automation and eliminating artifacts, includingcompressions, associated with Maxam-Gilbert and Sanger sequencing or anyother gel-based methods. This latter consideration may be of preeminentimportance as mass spectrometry will currently allow analysis ofcleavage reactions in a matter of seconds to minutes (and, in thefuture, milliseconds), compared to hours for current gel electrophoreticprocedures. Furthermore, the inherent accuracy of mass spectrometry,together with the control over the construction of the modifiedpolynucleotide that can be achieved using the methods of this inventionwill sharply reduce the need for sequencing redundancy. A representativetotal sequencing experiment is set forth in the Examples section, below.

The process of inferring DNA sequence from the pattern of massesobtained by cleavage of analyte molecules is considerably morecomplicated than the process for detecting and inferring the chemicalnature of sequence variances. In the case of sequencing by completecleavage and mass analysis the following must be accomplished:

-   -   Determine the length of the sequence. From the experimentally        determined masses infer the nucleotide content of each cleavage        fragment as discussed elsewhere herein. This analysis is        performed for each of the four sets of experimental cleavage        masses. The shortcomings of this analysis are that two or more        fragments (particularly short ones) may have identical mass, and        therefore may be counted as one, leading to an undercounting of        the length of the sequence. However, this is not a serious        experimental problem in that the fragment masses can be summed        and compared for all four cleavages; if they do not correspond        then there must be two or more overlapping masses among the        fragments. Thus, the determination of all fragment masses in all        four cleavage reactions essentially eliminates this source of        potential error. First, the set of cleavage masses that give the        greatest length can be taken as a starting point. Next, the        nucleotide content of all of the masses in the other three        cleavage reactions can be tested for whether they are compatible        with the nucleotide content of any of the masses associated with        the greatest length cleavage set. If they are not compatible,        then there must be undercounting even in the set associated with        the greatest length. Comparison of sequence contents will        generally allow the uncounted bases to be identified and the        full length of the sequence to thus be determined.    -   The next aspect of the analysis may include: (a) determining the        intervals at which A, C, G and T nucleotides must occur based on        the sizes of respective cleavage products; (b) analyze the        nucleotide content of the largest fragments from each cleavage        set to identify sets of nucleotides that belong together; (c)        compare nucleotide content of fragments between the different        sets to determine which fragments are compatible (i.e. one could        be subsumed within the other or they could overlap) or        incompatible (no nucleotides in common); (d) begin to integrate        the results of these different analyses to restrict the number        of ways in which fragments can be pieced together. The        elimination of possibilities is as useful as the identification        of possible relationships. A detailed illustration of the logic        required to work out the sequence of a short oligonucleotide is        provided in Example 4.

One way to provide additional information about local sequencerelationships is to reduce the extent of nucleotide substitution or thecompleteness of cleavage (see below) in order to obtain sets ofincompletely (but still substantially) cleaved fragments. The massanalysis of such fragments may be extremely useful, in conjunction withthe completely cleaved fragment sets, for identifying which fragmentsare adjacent to each other. A limited amount of such information isneeded to complete the entire puzzle of assembling the cleavagefragments into a continuous sequence.

Three additional ways to augment the inference of DNA sequence fromanalysis of complete substitution and cleavage masses are: (a) analysisof dinucleotide cleavage masses (see below), which can provide aframework for compartmentalizing the small masses associated withmononucleotide substitution and cleavage into fewer intermediate sizecollections. Dinucleotide cleavage also provides the location ofdinucleotides sequences at intervals along the entire sequence in fact,dinucleotide cleavage at all possible dinucleotides is an alternate DNAsequencing method; (b) mononucleotide substitution and cleavage of thecomplementary strand using one or more modified nucleotides which canprovide valuable complementary information on fragment length andoverlaps; (c) combination substitution and cleavage schemes employingsimultaneous di- and mononucleotide cleavages or two differentsimultaneous mononucleotide cleavages can provide unambiguousinformation on sequence order.

In the foregoing descriptions, it has been assumed that the modifiednucleotide is selectively more susceptible to chemical cleavage underappropriate conditions than the three unmodified nucleotides. However,an alternative approach to effecting mononucleotide cleavage is to usethree modified nucleotides that are resistant to cleavage under chemicalor physical conditions sufficient to induce cleavage at an unmodified,natural nucleotide. Thus, in another aspect of the present invention,mononucleotide cleavage may be effected by selective cleavage at anunmodified nucleotide. One chemical modification of nucleotides whichhas been shown to make them more stable to fragmentation during massspectrometric analysis is the 2′-fluoro modification. (Ono, T., et al.,Nucleic Acids Research, 1997, 25: 4581-4588.) The utility of 2′-fluorosubstituted DNA for extending the accessible mass range for Sangersequencing reactions (which is generally limited by fragmentation) hasbeen recognized, but it is an aspect of the present invention that thischemistry also has utility in effecting nucleotide specific cleavage byfully substituting three modified nucleotides that are resistant to aspecific physical or chemical cleavage procedure. Another chemicalmodification that has been shown to increase the stability ofnucleotides during MALDI-MS is the 7-deaza analog of adenine andguanine. (Schneider, K. and Chait, B. T., Nucleic Acids Research, 1995,23: 1570-1575.)

In another aspect of this invention, cleavage-resistant modifiednucleotides may be used in conjunction with cleavage-sensitive modifiednucleotides to effect a heightened degree of selectivity in the cleavagestep.

iii. Applications to Genotyping

As DNA sequence data accumulates from various species there isincreasing demand for accurate, high throughput, automatable andinexpensive methods for determining the status of a specific nucleotideor nucleotides in a biological sample, where variation at a specificnucleotide (either polymorphism or mutation) has previously beendiscovered. This procedure—the determination of the nucleotide at aparticular location in a DNA sequence—is referred to as genotyping.Genotyping is in many respects a special case of DNA sequencing (orvariance detection where only one position is being queried), but thesequence of only one nucleotide position is determined. Because only onenucleotide position must be assayed, genotyping methods do not entirelyoverlap with DNA sequencing methods. The methods of this inventionprovide the basis for novel and useful genotyping procedures. The basisof these methods is polymerization of a polynucleotide spanning thepolymorphic site. The polymerization may be either by the PCR method orby primer extension, but is preferably by PCR. The polymerization isperformed in the presence of three natural nucleotides and onechemically modified nucleotide, such that the chemically modifiednucleotide corresponds to one of the nucleotides at the polymorphic ormutant site. For example if an A/T polymorphism is to be genotyped thecleavable nucleotide could be either A or T. If a G/A polymorphism is tobe genotyped the cleavable nucleotide could be either A or G. Converselythe assay could be set up for the complementary strand, where T and Coccur opposite A and G. Subsequently the polymerization product ischemically cleaved by treatment with acid, base or other cleavagescheme. This results in two products from the two possible alleles, onelonger than the other as a result of the presence of the cleavablenucleotide at the polymorphic site in one allele but not the other. Amass change, but not a length change, also occurs on the oppositestrand. One constraint is that one of the primers used for producing thepolynucleotide must be located such that the first occurrence of thecleavable nucleotide after the end of the primer is at the polymorphicsite. This usually requires one of the primers to be close to thepolymorphic site. An alternative method is to simultaneously incorporatetwo cleavable nucleotides, one for a polymorphic nucleotide on the (+)strand, one for a polymorphic site on the (−) strand. For example, onemight incorporate cleavable dA on the (+) strand (to detect an A-Gpolymorphism) and cleavable dC on the (−) strand (to positively detectthe presence of the G allele on the (+) strand. In this case, it may beadvantageous to have both primers close to the variant site. The twoallelic products of different size can be separated by electrophoreticmeans, such as, without limitation, capillary electrophoresis. Theycould also be separated by mass using, without limitation, massspectrometry. In addition, a FRET assay can be used to detect them, asdescribed below. Any of these three assay formats is compatible withmultiplexing by means known in the art.

One way to perform a FRET detection for the presence or absence of theallelic cleavage product is to introduce a probe with a fluor or aquencher moiety such that the probe hybridizes differentially to thecleaved strand (representing one allele) vs. the non-cleaved strand(representing the other allele; see FIG. 2 for illustration of severalpossible schemes). Such differential hybridization is readily achievablebecause one strand is longer than the other by at least one, and oftenseveral nucleotides. If a fluor or quenching group is also placed on theprimer used to produce the cleavable polynucleotide (by PCR or primerextension) such that an appropriate FRET interaction between the moietyon the probe and the moiety on the primer exists, i.e., the absorbingand emitting wavelengths of the two moieties are matched, and thedistance and orientation between the two moieties is optimized bymethods known to those skilled in the art, then a powerful signal willbe present with one allele but not the other when the probe and primerare heated at the temperature that affords maximal hybridizationdiscrimination. Ideally the probe is synthesized in a manner that takesmaximal advantage of the different length of the cleaved and non-cleavedalleles. For example the primer should hybridize to the region that isremoved by cleavage in one allele but is present in the other allele.When selecting primers for the PCR or primer extension one experimentaldesign consideration would be to locate the primer so as to maximize thelength difference between the two alleles. Other means of maximizing thediscrimination would include the use of a “molecular beacon” strategywhere the ends of the probe are complementary, and form a stem, exceptin the presence of the non-cleaved allele where the non-cleaved segmentis complementary to the stem of the probe and therefore effectivelycompetes with the formation of intramolecular stems in the probemolecule (FIGS. 32 and 33).

The above FRET methods can be performed in a single tube, for example,as follows: (1) PCR; (2) addition of cleavage reagent (and heat ifnecessary); (3) addition of the probe; and (4) temperature ramping ifnecessary in an instrument such as the ABI Prism which is capable ofexcitation and fluorescence detection in 96 wells.

Another way to produce a FRET signal that discriminates between the twovariant alleles is to incorporate a nucleotide with a dye that interactswith the dye on the primer. The key to achieving differential FRET isthat the dye modified nucleotide must first occur (after the 3′ end ofthe primer) beyond the polymorphic site so that, after cleavage, thenucleotide dye of one allele (cleaved) will no longer be in within therequisite resonance producing distance of the primer dye while, in theother (uncleaved) allele, the proper distance will be maintained andFRET will occur. The only disadvantage of this method is that itrequires a purification step to remove unincorporated dye molecules thatcan produce a background signal, which might interfere with the FRETdetection. A non-limiting example of the experimental steps involved incarrying out this method are: (1) PCR with dye-labeled primer and eithera cleavable modified nucleotide with also carrier a dye or one cleavablemodified nucleotide and one dye-labeled nucleotide. The dye can be onthe cleavable nucleotide if the cleavage mechanism results in separationof the dye from the primer as, for instance, in the case of 5′-aminosubstitution which results in cleavage proximal to the sugar and base ofthe nucleotide; (2) cleavage at the cleavable modified nucleotide; (3)purification to remove free nucleotides; and (4) FRET detection.

As noted earlier in this disclosure, we have demonstrated thatpolynucleotides' containing 7-nitro-7-deaza-2′-deoxyadenosine in placeof 2′-deoxyadenosine may be specifically and completely cleaved usingpiperidine/TCEP/Tris base. There are many other examples of chemistrieswhere such PCR amplification and chemical cleavage may be possible. In aputative genotyping assay, a PCR reaction is carried out with onecleavable nucleotide analogue along with three other nucleotides. ThePCR primers may be designed such that the polymorphic base is near oneof the primers (P) and there is no cleavable base between the primer andthe polymorphic base. If the cleavable base is one of the polymorphicbases, the P-containing cleavage product from this allele is expected tobe shorter than the product from the other allele. The schematicpresentation (FIG. 27) and experimental data (FIGS. 28 to 31) areexamples of this arrangement. If the cleavable base is different fromeither of the polymorphic bases, the P-containing fragment would havethe same length, but different molecular weight for the two alleles. Inthis case, Mass Spectrometry would be the preferred analytical tool;although we had observed that oligonucleotides with one single basedifference may migrate differently when analyzed by capillaryelectrophoresis. In one specific example, an 82 bp fragment ofTransferrin Receptor gene was amplified by PCR using7-nitro-7-deaza-2′-deoxyadenosine in place of 2′-deoxyadenosine. Thepolymorphic base pair is A:T to G:C. The PCR amplification generatedfully substituted product in similar yields to that of natural DNA (FIG.28). MALDI-TOF Mass Spectrometry analysis revealed the polymorphism intwo regions of the spectra, the first between 7000 Da and 9200 Da andthe second between 3700 Da and 4600 Da (FIG. 30, panel A). The firstregion demonstrated the difference in primer-containing fragments ofdifferent lengths (FIG. 30. panel B). The second region showed theopposite strand of DNA containing the polymorphism that has the samelength but different mass (FIG. 30, panel C). The common fragmentsbetween the two alleles may serve as mass references. Capillaryelectrophoresis analysis may also be used (FIG. 31). Mobility differencebetween the two fragments of different length was easily detected in thetest sample, as expected. In addition, mobility difference between twopolymorphic fragments (11 nt) of same length but one different base (Cvs. T) was observed, providing supporting evidence from the oppositestrand. FIG. 32 illustrates schemes for FRET detection of the samepolymorphic site.

b. Full Substitution, Full Extension and Complete Cleavage atDinucleotides

In another aspect of the present invention, two of the four nucleotidesof which the subject polynucleotide is composed are completely replacedwith modified nucleotides (either on one strand using primer extension,or on both strands using a DNA amplification procedure) andsubstantially complete cleavage is then effected preferentially at thesite of dinucleotides involving the two different modified nucleotides.Generally, given the steric constraints of most cleavage mechanisms, thetwo modified nucleotides will be cleaved only when they occur in aspecific order. For example if T and C are modified, the sequence 5′ TpC3′ would be cleaved but 5′ CpT 3′ would not (5′ and 3′ indicate thepolarity of the polynucleotide strand and p indicates an internalphosphate group).

The rationale for dinucleotide cleavage is that mononucleotide cleavageis not ideally suited to the analysis of polynucleotides longer than 300to 400 nucleotides because the number of fragments that must be detectedand resolved by the mass spectrometer may become limiting and thelikelihood of coincidental occurrence of two or more cleavage fragmentswith the same mass increases and begins to limit the efficiency of themethod. This latter problem is especially acute with respect to theoccurrence of mono-, di- and tri- and tetranucleotides of the samecomposition which can mask the appearance or disappearance of fragmentsbecause MS is not quantitative. In contrast, capillary electrophoresis,while not providing mass and thereby nucleotide content, is aquantitative method that allows detection of variation in the numbers ofdi-, tri- and tetranucleotides.

Cleavage at modified dinucleotides should result in fragments averagingsixteen nucleotides in length. This is because the abundance of anydinucleotide, given four nucleotides, is 42, which equals 16, assumingnucleotide frequencies are equal and there is no biological selectionimposed on any class of dinucleotides (i.e. their occurrence is random).Neither of these assumptions is completely accurate, however, so therewill in actuality be a wide size distribution of cleavage masses, withconsiderable deviation in the average size mass depending on whichnucleotide pair is selected for substitution and cleavage. However,available information concerning the frequency of various dinucleotidesin mammalian, invertebrate and prokaryotic genomes can be used to selectappropriate dinucleotides. It is well known, for example, that 5′ CpG 3′dinucleotides are underrepresented in mammalian genomes; they can beavoided if relatively frequent cleavage intervals are desired.

i. Applications to Variance Detection

If the sequence of the analyte polynucleotide is known, then an optimaldinucleotide cleavage scheme can be selected based on analysis of themasses of predicted cleavage fragments. For example, cleavage fragmentsthat fall within the size range optimal for analysis by massspectrometry can be selected by analysis of the fragment sizes producedby all possible dinucleotide cleavage schemes. Further, the theoreticalefficiency of variance detection associated with all possibledinucleotide cleavage schemes can be determined as described above forfull morionucleotide substitution and cleavage—that is, by determiningthe detectability of every possible nucleotide substitution in theentire analyte fragment. In some cases two or more independentdinucleotide cleavage reactions may produce complementary results, or asecond dinucleotide cleavage experiment may be run to providecorroboration.

Given the length of dinucleotides (16 mers on the average), it willoften not be possible to determine with precision the location of avariant nucleotide based on one dinucleotide cleavage experiment. Forexample, if a 15 Dalton mass difference between samples is detected in a14 mer then there must be a C<->T variance (Table 2) in the 14 mer, withthe heavier alleles containing T at a position where the lighter allelescontain C. However, unless there is only one C in the lighter variantfragment, or only one T in the heavier variant fragment, it isimpossible to determine which, C or T, is the variant one. Thisambiguity regarding the precise nucleotide that varies can be resolvedin several ways. First, a second mono- or dinucleotide substitution andcleavage experiment, or a combination of such cleavage experiments, maybe designed so as to divide the original variant fragment into piecesthat will allow unambiguous assignment of the polymorphic residue.Second, an alternative sequencing procedure may be used as anindependent check on the results, such as Sanger sequencing orsequericing by hybridization.

ii. Applications to DNA Sequencing

As a stand alone procedure, dinucleotide substitution and cleavage canprovide useful information concerning nucleotide content of DNAfragments averaging about 16 nucleotides in length, but ranging up to30, 40 or even. 50 or more nucleotides. However, as described above, themain applications of dinucleotide cleavage to DNA sequencing occur inconjunction with mononucleotide cleavage. The comparatively large DNAfragments produced by dinucleotide cleavage can be very useful inassorting the smaller fragments produced by mononucleotide cleavage intosets of fragments which must fit together. The additional constraintsimposed by these groupings can be sufficient to allow complete sequenceto be determined from even relatively large fragments.

In Example 4 the steps required to infer a nucleotide sequence from a 20mer using four mononucleotide substitution and cleavage reactions areshown. The procedures described in Example 4 could be carried out on aseries of 10-30 mers, the sequence content of which was initiallydefined, or at least constrained, by a dinucleotide cleavage procedure.Thereby, the sequence of a much larger fragment can be obtained. Notethat as nucleotide length increases the relationship between fragmentmass and sequence content becomes more ambiguous; that is, there aremore and more possible sequences that could produce the given mass.However, if the number of nucleotides comprising the mass is known thenumber of possible nucleotide contents falls significantly (Pomerantz,S. C., et al., J. Am. Soc. Mass Spectrom., 1993, 4: 204-209). Further,sequence constraints, such as the lack of internal dinucleotidesequences of a particular type, further reduce the number of possiblenucleotide contents as illustrated in Table 4 for mononucleotide sets.

c. Full Substitution with Modified Nucleotide and Partial CleavagePartial Substitution with Modified Nucleotide and Full Cleavage PartialSubstitution with Modified Nucleotide and Partial Cleavage

These applications provide partially cleaved polynucleotides bydifferent strategies; each of these procedures has utility in specificembodiments of the invention. However, full substitution with a modifiednucleotide and partial cleavage is the preferred method of producingpartial cleavage products for mass spectrometric analysis. The reason isthat with full substitution one can vary the degree of partial cleavageover a very wide spectrum, from cleavage of 1 in 100 nucleotidestocleavage of 99 in 100 nucleotides. Partial substitution, even with fullcleavage, does not allow this range of cleavage completeness. However,for modified nucleotides that are not efficiently incorporated bypolymerases, lesser degrees of substitution are preferred. As thecompleteness of cleavage is reduced the relationship between cleavagefragments over a longer and longer range becomes evident. On the otherhand as the completeness of cleavage is increased the ability to obtainprecise mass data and unambiguous assignment of nucleotide content isincreased. The combination of slight, intermediate and substantialcleavage provides an integrated picture of an entire polynucleotide,whether the application is variance detection or sequencing. The smallpolynucleotides of defined nucleotide content can be joined into largerand larger groups of defined order.

Partial substitution with full cleavage and partial substitution withpartial cleavage are useful for the preparation of sequencing ladders.If a modified nucleotide is not efficiently incorporated intopolynucleotides by available polymerases then a low ratio of partialsubstitution may be optimal for efficient production of polynucleotidescontaining the modified nucleotide. However a low degree of substitutionmay then require complete cleavage in order to produce sufficientcleavage fragments for ready detection.

Partial substitution with partial cleavage is generally a preferredapproach as conditions for complete cleavage may be harsh and therebyresult in some nonspecific cleavage or modification to polynucleotides.Also, partial substitution at relatively high levels (i.e. at 5% or moreof the occurrences of the nucleotide) allows a range of partial cleavageefficiencies to be analyzed. As with MS analysis, there are advantagesto being able to test multiple degrees of cleavage. For example, it iswell known in Sanger sequencing that there are tradeoffs to productionof very long sequence ladders: generally the beginning of the ladder,with the shortest fragments, is difficult to read as is the end of theladder with the longest fragments. Similarly, the ability to manipulatepartial cleavage conditions with the polynucleotides of this inventionwill allow a series of sequencing ladders to be produced from the samepolynucleotide that provide clear sequence data close to the primer orat some distance from the primer. As shown in FIG. 17, sequence laddersproduced by chemical cleavage have a much better distribution of labeledfragments than dideoxy termination over distances up to 4 kb and beyond.

Partial cleavage may also be obtained by the substitution ofcleavage-resistant modified nucleotides, described above, for all butone natural nucleotide, which then provides the cleavage sites. Inaddition, as described previously, combinations of cleavage resistantmodified nucleotides and cleavage-sensitive modified nucleotides may beused.

While any technique which permits the determination of the mass ofrelatively large molecules without causing non-specific disintegrationof the molecules in the process may be used with the methods of thisinvention, a preferred technique is MALDI mass spectroscopy since it iswell suited to the analysis of complex mixtures of analyte. CommercialMALDI instruments are available which are capable of measuring mass withan accuracy on the order of 0.1% to 0.05%. That is, these instrumentsare capable of resolving molecules differing in molecular weight by aslittle as one part in two thousand under optimal conditions. Advances inMALDI MS technology will likely increase the resolution of commercialinstruments in the next few years. Considering the smallest differencethat can occur between two strands containing a variance (an A-Ttransversion, a molecular weight difference of 9; see Table 5), andgiven a MALDI apparatus with a resolution of 2,000 (that is, a machinecapable of distinguishing an ion with an m/z (mass/charge) of 2,000 froman ion with an m/z of 2,001), the largest DNA fragment which the A-Ttransversion would be detectable is approximately 18,000 Daltons (a‘Dalton’ is a unit of molecular weight used when describing the size oflarge molecules; for all intents and purposes it is equivalent to themolecular weight of the molecule). In the experimental setting, thepractical resolving power of an instrument may be limited by theisotopic heterogeneity of carbon; i.e., carbon exists in nature asCarbon-12 and Carbon-13, as well as other factors. Assuming anapproximately even distribution of the four nucleotides in the DNAfragment, this translates to detection of an A-T transversion in anoligonucleotide containing about 55 nucleotides. At the other end of thespectrum, a single C-G transversion, which results in a molecular weightdifference of 40, could be detected using MALDI mass spectroscopy in anoligonucleotide consisting of about 246 nucleotides. The size of anoligonucleotide in which an A-T transversion would be detectable couldbe increased by substituting a heavier non-natural nucleotide for eitherthe A or the T; for example, without limitation, replacing A with7-methyl-A, thus increasing the molecular weight change to 23. Table 5shows the approximate size of an oligonucleotide in which each possiblesingle point mutation could be detected for mass spectrometers ofdifferent resolving powers without any modification of molecular weight.

A variety of chemical modifications of nucleotides have been describedwith respect to their utility in increasing the detectability of massdifferences during MS analysis. A particularly useful mass modificationfor use with the methods of this invention is the purine analog2-chloroadenine, which has a mass of 364.5. As shown in Table 2, PanelB, this has a favorable effect on mass differences between all thenucleotides and A. Most important, it changes the T-A difference from 9Da to 42.3 Da. Further, it has been shown that 2-chloradenine can beincorporated in polynucleotides by DNA polymerase from Thermusaquaticus. Full substitution on one strand has been described. (Hentosh,P. Anal. Biochem., 1992, 201: 277-281.)

E. EXAMPLES

1. Polymerase Development

A variety of mutant polymerases have bee shown to have altered catalyticproperties with respect to modified nucleotides. Mutant polymerases withreduced discrimination between ribonucleotides and decxyribonucleotideshave been extensively studied. Human DNA polymerase p mutants thatdiscriminate against azidothymidine (AZT) incorporation have beenisolated by genetic selection. Thus, it is highly likely that mutantpolymerases capable of incorporating any of the modified nucleotides ofthis invention better than natural polymerases can be produced andselected.

The following procedure can be employed to obtain an optimal polymerasefor incorporation of a particular modified nucleotide or nucleotidesinto a polynucleotide. It is understood that modifications of thefollowing procedure will be readily apparent to those skilled in theart; such modifications are within the scope and spirit of thisinvention.

a. A starting polymerase is selected. Alternatively, multiplepolymerases that have different sequences and/or different capabilitieswith regard to incorporation of a modified nucleotide or nucleotidesinto a polynucleotide might be selected. For example, withoutlimitation, two polymerases, one of which efficiently incorporates anucleotide having a sugar modification and the other of whichefficiently incorporates a nucleotide having a phosphate backbonemodification, might be selected. The coding sequences of thepolymerase(s) are then cloned into a prokaryotic host.

It may be advantageous to incorporate a protein tag in the polymeraseduring cloning, the protein tag being selected for its ability to directthe polymerase into the periplasmic space of the host. An example,without limitation of such a tag is thioredoxin. Proteins in theperiplasmic space can be obtained in a semi-pure state by heat shock (orother procedures known in the art) and are less likely to beincorporated into inclusion bodies.

b. Several (preferably three or more) rounds of shuffling (Stemmer,supra) are then performed.

c. After each round of shuffling, the shuffled DNA is transformed into ahost. The library of transformants obtained is then plated and pools oftransformants (approximately 10-1000 colonies per pool) are preparedfrom the host cell colonies for screening by sib selection. A lysate isthen made from each pool. The host may be prokaryotic such as, withoutlimitation, bacteria or a single-celled eukaryote such as yeast. Thefollowing description assumes the use of a bacterial prokaryotic hostbut other possible prokaryotic hosts will be apparent to those skilledin the art and are within the scope and spirit of this invention.

d. The lysates are subjected to dialysis using a low molecular weightcut-off membrane to remove substantially all natural nucleotides. Thisis necessary because the assay for polymerase with the desiredcharacteristics entails polymerase extension of a primer in the presenceof modified nucleotides. The presence of the corresponding naturalnucleotides will result in a high background in the assay that mightobscure the results. An alternative procedure is degradation of allnatural nucleotides with a phosphatase such as shrimp alkalinephosphatase followed by heat inactivation of the phosphatase.

e. Add the following to the dialyzed lysate: a single stranded DNAtemplate, a single stranded DNA primer complementary to one end of thetemplate, the modified nucleotide or nucleotides whose incorporationinto the DNA is desired and the natural nucleotides which are not beingreplaced by the modified nucleotides. If the desired polymerase is tohave the capability of incorporating two contiguous modifiednucleotides, then the template should be selected to contain one or morecomplementary contiguous sequences. For example, without limitation, ifa polymerase which is capable of incorporating a modified-C-modified-Tsequence is desired 5′ to 3′, the template should contain one or moreG-A or A-G sequences 3′ to 5′. Following (that is, 5′ to) the segment ofthe template strand designed to test the ability of the polymerase toincorporate the modified nucleotide or nucleotides is segment oftemplate strand that produces a detectable sequence when copied by thepolymerase. The sequence can be detected in several ways. Onepossibility is to use a template having a homopolymeric segment ofnucleotides complementary to one of the natural nucleotides. Then, ifthe goal is, for example, identification of a polymerase thatincorporates modified C, then detection might entail polymerization of aconsecutive series of A, G or T providing, however, that the nucleotideused for detection does not occur earlier in the polymerized sequencecomplementary to the template sequence. The detection nucleotide couldbe a radiolabeled or dye-labeled nucleotide that would only beincorporated by mutant polymerase that had already traversed the segmentof template requiring incorporation of the modified nucleotide(s).Another way to detect the homopolymer would be to make a complementaryradiolabeled or dye-labeled probe that could be hybridized to thehomopolymer produced only in those pools containing a polymerase capableof incorporating the modified nucleotide(s). Hybridization could then bedetected by, for example, spotting the primer extension products fromeach pool on a nylon filter, followed by denaturing, drying and additionof the labeled homopolymeric probe, which would hybridize, to thecomplementary strand of the polymerization product. Of course, ahomopolymer or other sequence not present in the host cell genome or anepisomes should be used to minimize background hybridization to hostsequences present in all the pools.

Yet another detection procedure would be to incorporate a sequencecorresponding to an RNA polymerase promoter, such as, withoutlimitation, the T7 promoter, followed by a reporter sequence into thetemplate. These sequences should be located downstream (3′ to) theprimer and template sequence requiring incorporation of modifiednucleotides. The T7 promoter will be inactive until it becomesdouble-stranded as a consequence of the polymerization; however,polymerization of the T7 promoter sequence will only occur if the mutantpolymerase being tested is capable of incorporating the modifiednucleotide or sequence of modified nucleotides which lie upstream of theT7 promoter sequence. The reporter sequence may include a homopolymericsequence of a nucleotide (e.g., T) the complement of which (in thiscase, A) is labeled with a dye or radioactive label. In this manner,high levels of T7 polymerase mediated transcription will result in largequantities of high molecular weight (i.e., capable of precipitation bytrichloroacetic acid), labeled polymer. An alternative reporter sequencemight be a ribozyme capable of cleaving an exogenously added markeroligonucleotide which permits easy distinction of cleaved fromnon-cleaved products. For example, again without limitation, one end ofthe oligonucleotide might be biotinylated and the other end mightcontain a fluorescent dye. Such systems are capable of 1000-fold orgreater amplification of a signal. In this approach it would first benecessary to demonstrate that the function of the promoter is notdisturbed by the presence of modified nucleotide or to create a versionof the promoter that lacks the nucleotide being modified.

f. Any pool of lysed bacterial colonies which contains a polymerasecapable of incorporating the selected modified nucleotide or contiguousmodified nucleotides will produce detectable homopolymer or will containdouble-stranded T7 RNA polymerase promoter upstream of a marker sequenceas the result of the polymerization across the modified nucleotide orcontiguous nucleotides, across the T7 promoter and across the markersequence. Addition of T7 RNA polymerase to the mixture (or,alternatively, expression of T7 RNA polymerase from a plasmid) willresult in transcription of the marker sequence, which then can bedetected by an appropriate method depending on the marker systemselected. It may not be necessary to select or design a promoter whicheither lacks the modified nucleotide(s) or which can functioneffectively with the modified nucleotide(s).

g. Bacterial colonies containing a polymerase having the desiredproperties are then identified and purified from pools of bacterialcolonies by sib selection. In each round of selection the pool or poolswith the desired properties are split into sub-pools and each sub-poolis tested for activity as set forth above. The sub-pool displaying thehighest level of activity is selected and separated into a second roundof subpools and the process repeated. This is repeated until there isonly one colony remaining which contains the desired polymerase. Thatpolymerase can then be recloned into a protein expression vector andlarge amounts of the polymerase can be expressed and purified.

Another approach to polymerase development involves the well-knownpropensity for some antibiotics to kill only growing cells, e.g.,penicillin and related drugs, which kill by interfering with bacterialcell wall synthesis of growing cells but do not affect quiescent cells.

The approach would be to introduce a modified nucleotide into bacterialcells, which have been genetically, altered to express one or moremutant polymerases, preferably a library of mutant polymerases. An idealhost strain would be one in which the endogenous polymerase has beeninactivated but is complemented by a plasmid-encoded polymerase. Alibrary of polymerases could than be created on a second plasmid with adifferent selectable marker, e.g., antibiotic resistance. The librarywould then introduced into the host cell in the presence of negativeselection against the first (non-mutated) polymerase-encoding plasmid,leaving cells with only the mutant polymerases. If one or more of themutant polymerases is capable of incorporating the modified nucleotideinto the genetic material of the cells, the expression of the modifiedgene(s) will be altered and/or a series of host cell responses will beelicited which as the SOS response which affects cell growth. The effectsought would be reversible growth arrest, i.e., a cytostatic rather thancytocidal effect. The cells would then be treated with an antibioticthat only kills actively growing cells. The cells are then removed fromthe presence of the antibiotic and placed in fresh growth medium. Anycells whose growth was arrested by the incorporation of the modifiednucleotide into their genetic material and therefore which wereunaffected by the antibiotic would form colonies. Plasmids containingthe code for the polymerase which catalyzed the incorporation of themodified nucleotide into the cells' genetic material are then isolatedand the procedure repeated for additional rounds of selection. Once asufficient number of selection rounds have been performed, thepolymerase is isolated and characterized. An exemplary, but by no meanslimiting, experimental procedure, which might be employed to accomplishthe foregoing, is as follows:

1. Select a polymerase or set of polymerases for mutagenesis. Thestarting polymerase(s) may include, without limitation, a mutantpolymerase such as Klenow E710A, wild type polymerases, thermostable orthermolabile polymerases or polymerases known to complement E. coli DNAPol I, etc.

2. Prepare a library of mutant polymerases using techniques such as“dirty PCR,” shuffling, site-directed mutagenesis or other diversitygenerating procedures.

3. Clone the library into a plasmid vector.

4. Transform bacteria with the plasmid library and isolate transfectantsby selection on an appropriate antibiotic. Preferably, the host strainhas an inactivated chromosomal polymerase and selection can be appliedto insure that only the mutant polymerases are expressed in the hostcells, as described above. The only cells harboring plasmids encodingfunctional polymerases will survive this step.

5. Add the modified nucleotide triphosphate to the media. It may benecessary to use a cell permeabilizing procedure such aselectroporation, addition of calcium or rubidium chloride, heat shock,etc. to facilitate entrance of the modified nucleotide into the cells.The cells are then grown in the presence of the modified nucleotidetriphosphate until incorporation of the modified nucleotide(s) inducesarrest of cell growth in selected cells.

6. Add penicillin, ampicillin, nalidixic acid or any other antibioticthat selectively kills actively dividing cells. Continue growing thecells for a selected time.

7. Spin the cells out, suspend them in fresh LB media and plate them.Grow for an empirically determined time.

8. Select colonies, isolate the plasmids and repeat steps 4 to 7 foradditional rounds of selection or, in the alternative, use a biochemicalassay for incorporation of the modified nucleotide to examine individualcolonies or pools of colonies. Such an assay might entail polymerizationof a template in the presence of radiolabeled modified nucleotide onindividual clones or on pools of clones in a sib selection scheme.

9. Further characterize the polymerase(s) determined to have the desiredactivity by the assay of step 8.

10. Remutagenize the polymerase(s) obtained in Step 8 and repeat theselection procedure from Step 3.

11. When an acceptable level of ability to incorporate the modifiednucleotide is achieved, isolate and characterize the polymerase.

Another method for selecting active polymerases for incorporation ofmodified nucleotide involves use of a bacteriophage which has beendescribed for selection of an active enzyme (Pedersen et. al., Proc.Natl. Acad. Sci. USA, 1998, 95:10523-8). A modification of thatprocedure might be used for mutant polymerase selection. That is,oligonucleotides, which are covalently attached to phage surfaces, canbe extended by mutant polymerases expressed on the surface of the phage.Dye-labeled modified nucleotides would be used for primer extension.After removal of unincorporated nucleotides, the phage bearingdye-modified nucleotide could be identified using fluorescence-activatedcell sorting procedures. Alternatively, using an appropriate templatedesign, the fluorescence label can be attached to another nucleotidethat would only be incorporated downstream of a stretch of modifiednucleosides.

Yet another approach to identifying active polymerases for modifiednucleotide incorporation would use available X-ray crystal structures ofpolymerases bound to template DNA and nucleotide substrate. Based onobserved or predicted interactions within the polymerase/substratecomplex, rational amino acid changes could be created to accommodate thestructural deviation in given modified nucleotides. For example, basedon the structural information on a complex of T7 polymerase and itssubstrates for which the X-ray crystal structure shows the amino acidsthat are in the polymerase active site (Doublie et. al., Nature, 1998,391:251-258), site-directed mutagenesis might be designed forstructurally similar protein Klenow to increase its specific activityfor incorporation of ribonucleotides (rNTPs) and/or 5′-amino-nucleotides(5′-aminodNTPs).

The E710A mutant of Klenow (Astatke et. al., Proc. Nat. Acad. Sci. USA,1998, 95:3402-3407) has an increased capacity to incorporate rNTPs ascompared to wild type Klenow, probably because the mutation removes thesteric gate against 2′-hydroxyl group of rNTPs. This mutation, however,decreased the mutant's activity for incorporation of natural dNTPs and5′-amino dNTPs. In this case, use of the E710S mutation might lead toimproved activity because E710S might possibly H-bond with the 2′-OH ofrNTPs substrates. The E710A or E710S mutation might also be used incombination with Y766F, a previously described mutant which by itselfhas little effect on polymerase activity (Astatke et al., J. Biol.Chem., 1995, 270: 1945-54). The crystal structure of Y766 reveals thatits hydroxyl forms hydrogen bonds with the side chain of E710, whichmight affect polymerase activity when E710 is truncated to Ala. On theother hand, E710 mutations in combination with F762A might improveactivity by holding the sugar ring in a defined position. Similarly,better incorporation of the 5′-amino-analogs might be achieved byrelaxing the binding of the polymerase on the nucleotide substrate sincethe 5′-nitrogen changes the conformation of the nucleotide and thus thealignment of the alpha-phosphorous atom. Initially, the focus could beon mutagenesis on a limited number of residues that engage the sugar andphosphates of the nucleotide substrate such as residues R668, H734, andF762. The H881 residue might also work. Although It is further from thedNTP binding site, an Ala substitution at this position influences thefidelity of dNTP incorporation (Polesky et al., J. Biol. Chem., 1990,265:14579-91). These residues could be targeted for cassette mutagenesisto ascertain the, amino acid residue with maximized effect, followed byselection for active polymerases as described. R668K substitution isparticularly interesting, because it should eliminate contact to thedNTP while preserving the minor groove interaction with the primer3′-NMP. On the other hand, Although R754 and K758 contact the beta andalpha phosphates, changes at these positions are likely to severelyimpair catalysis. Histidine or lysine at these positions could preserveinteractions with the phosphates and might retain activity.

Another method for selecting active polymerases for incorporation ofmodified nucleotides involves use of the phage display system, whichallows foreign proteins to be expressed on the surface of bacteriophageas fusions with phage surface proteins. Kay, B. K., Winter, J. and J.McCafferty (Editors) Phage Display of Peptides and Proteins: ALaboratory Manual. Academic Press, 1996. Establishing an experimentalsystem for detection of a mutant polymerase would entail expressingmutant polymerases on the surface of a library of phage, andsubsequently isolating phage-bearing polymerases with the desiredpolymerase activity. Aspects of such a system have been described forselection of an active enzyme nuclease (Pedersen et. al., Proc. Natl.Acad. Sci. USA, 1998, 95:10523-8). A modification of that proceduremight be used for mutant polymerase selection. That is,oligonucleotides, which are covalently attached to proteins on the phagesurfaces surface can be extended by mutant polymerases, expressed on thesurface of the same phage. The oligonucleotides must fold up to providea primer-template complex recognizable by the polymerase, oralternatively a primer complementary to the oligonucleotide can beprovided separately. In either event, the portion of the oligonucleotideserving as a template for polymerization will contain nucleotidescomplementary to the modified nucleotide(s) for which an efficientpolymerase is being sought. The template oligonucleotide may also bedesigned so that the extension product is easily detectable as a resultof templated incorporation of a labeled nucleotide that occurs onlyafter polymerization across the segment of template requiringincorporation of the modified nucleotide(s). One method for selectivelyenriching phage-bearing polymerases with the desired catalyticproperties involves use of a fluorescence activated cell sorter (FACS).Dye-labeled modified nucleotides would be used for incorporated in aprimer extension reaction only after incorporation of the test modifiednucleotide(s). After removal of unincorporated nucleotides, the phagebearing polymerase with attached dye modified nucleotides (which mustencode mutant polymerases capable of incorporating the modifiednucleotide or nucleotides) could be enriched in one or more rounds usingfluorescence activated cell sorting procedures (Daugherty P. S., et al.,Antibody affinity maturation using bacterial surface display. ProteinEng 11:825-32, 1998). Alternatively, the modified nucleotide(s)themselves can be labeled with dye and detection will similarly beaccomplished by FACS sorting of dye labeled phage. This procedure hasthe disadvantage that the dye may interfere with polymerization; howeverone skilled in the art will recognize that the dye can be attached tothe modified nucleotide via a linkage that is unlikely to inhibitpolymerization using an appropriate template design, the fluorescencelabel can be attached to another nucleotide which would only beincorporated downstream of a stretch of modified nucleosides.

Yet another approach to identifying active polymerases for modifiednucleotide incorporation would be to use available X-ray crystalstructures of polymerases bound to template DNA and nucleotidesubstrate. Based on observed or predicted interactions within thepolymerase/substrate complex, rational amino acid changes could becreated to accommodate the structural deviation of a given modifiednucleotides. For example, based on the structural information on acomplex of T7 polymerase and its substrates for which the X-ray crystalstructure shows the amino acids that are in the polymerase active site(Doublie et. al., Nature, 1998, 391:251-258), site-directed mutagenesismight be designed for structurally similar protein Klenow to increaseits specific activity for incorporation of ribonucleotides (rNTPs)and/or 5′-amino-nucleotides (5′-aminodNTPs).

The E710A mutant of Klenow (Astatke et. al., Proc. Nat. Acad. Sci. USA,1998, 95:3402-3407) has an increased capacity to incorporate rNTPs ascompared to wild type Klenow, probably because the mutation removes thesteric gate against 2′-hydroxyl group of rNTPs. This mutation, however,decreased the mutant 's activity for incorporation of natural dNTPs and5′-aminodNTPs. In this case, use of the E710S mutation might lead toimproved activity because E710S might possibly H-bond with the 2′-OH ofrNTPs substrates. The E710A or E710S mutation might also be used incombination with Y766F, a previously described mutant which by itselfhas little effect on polymerase activity (Astatke et al., J. Biol.Chem., 1995, 270: 1945-54). The crystal structure of Y766 reveals thatits hydroxyl forms hydrogen bonds with the side chain of E710, whichmight affect polymerase activity when E710 is truncated to Ala. On theother hand, E710 mutations in combination with F762A might improveactivity by holding the sugar ring in a defined position. Similarly,better incorporation of the 5′-amino-analogs might be achieved byrelaxing the binding of the polymerase on the nucleotide substrate sincethe 5′-nitrogen changes the conformation of the nucleotide and thus thealignment of the alpha-phosphorous atom. Initially, the focus could beon mutagenesis on a limited number of residues that engage the sugar andphosphates of the nucleotide substrate such as residues R668, H734, andF762. The H881 residue might also work. Although It is further from thedNTP binding site, an Ala substitution at this position influences thefidelity of dNTP incorporation (Polesky et al., J. Biol. Chem., 1990,265:14579-91). These residues could be targeted for cassette mutagenesisto ascertain the amino acid residue with maximized effect, followed byselection for active polymerases as described. R668K substitution isparticularly interesting, because it should eliminate contact to thedNTP while preserving the minor groove interaction with the primer3′-NMP. On the other hand, Although R754 and K758 contact the beta andalpha phosphates, changes at these positions are likely to severelyimpair catalysis. Histidine or lysine at these positions could preserveinteractions with the phosphates and might retain activity.

One skilled in the art will recognize that the collection of preferredamino acid modifications to Klenow polymerase described above might beapplied to other polymerases to produce useful mutant versions of thosepolymerases. This can be accomplished by aligning the amino acidsequences of the other polymerases with that of Klenow polymerase todetermine the location of the corresponding amino acids in the otherpolymerases, and/or, where crystal structures are available, comparingthree dimensional structures of other polymerases with that of Klenowpolymerase to identify orthologous amino acids. Methods for performingsite directed mutagenesis and expressing mutant polymerases inprokaryotic vectors are known in the art (Ausubel, F. M., et al.,Current Protocols in Molecular Biology, John Wiley & Sons, 1998).

In addition to producing and screening for mutant polymerases capable ofincorporating modified nucleotides it may also be useful in someinstances to screen for other polymerase properties. In general theadditional desirable polymerase properties described below are moredifficult to assay than incorporation of modified nucleotides, so assaysfor these additional properties may be conducted as a second screen ofmutant polymerases with demonstrated capacity to incorporate modifiednucleotides. One aspect of this invention is that cleavage at modifiednucleotides may be caused or enhanced by contact between the modifiednucleotides and a polymerase (see Example and FIGS. 20-26). This is apreferred cleavage mode as it obviates a separate cleavage step. Thus itis useful to assay mutant polymerases for cleavage-enhancing properties.One simple assay for such properties is a primer extension where theextension sequence following the primer includes the cleavablenucleotide(s) followed by the first occurrence of a different nucleotidethat is detectably labeled. In the event of polymerase assisted cleavagethe labeled molecule will be separated from the primer resulting in asmaller labeled molecule, which can be detected by electrophoretic orother methods. A second useful property of mutant polymerases is theability to recognize a modified nucleotide or nucleotides in a templatestrand and catalyze incorporation of the appropriate complementarynucleotide (natural or modified) on the nascent complementary strand.This property is a necessary condition for a polymerase to be used in acycling procedure such as PCR, where newly synthesized polynucleotidesserve as templates in successive rounds of amplification. A simple assayfor such properties is a short primer extension where the templatestrand is synthesized with the modified nucleotide or nucleotidesoccurring shortly after the end of the primer, such that a primerextension reaction will soon encounter the modified nucleotide(s).Successful polymerization across the template, indicating use of themodified nucleotide(s) as templates, will result in a longer extensionproduct than failure to utilize the modified nucleotides as templates.The extension product can be made easily detectable by synthesizing thetemplate so as to cause templated incorporation of a labeled nucleotideonly after traversing the modified nucleotide(s). The sequence of theextension product can subsequently be determined to confirm that thenucleotides incorporated on the extension strand opposite the modifiednucleotides are correct. Still other attractive properties ofpolymerases include high fidelity, thermostability and processivity.Assays for these properties are known in the art.

Example 2 Variance Detection by Mononucleotide Restriction

The following procedure is an example of nucleotide sequence variancedetection in a polynucleotide without the necessity of obtaining thecomplete sequence of the polynucleotide. While the modified nucleotideused in this example is 7-methylguanine (7-methylG) and thepolynucleotide under analysis is a 66 base-pair fragment of a specificDNA, it is understood that the described technique may be employed usingany of the modified nucleotides discussed above or any other modifiednucleotides which, as noted above, are within the scope and spirit ofthis invention. The polynucleotide may be any polynucleotide of anylength that can be produced by a polymerase.

A 66 base pair region of the 38 Kda subunit of replication factor C(RFC) cDNA was amplified by PCR (polymerase chain reaction). Threeprimers were used in two separate amplification reactions. The forwardprimer (RFC bio) was biotinylated. This allows the isolation of asingle-stranded template using avidin-coated beads which can then beextended using the Klenow exo-fragment of E. coli DNA polymerase toincorporate the 7-methylG. This also permits cleanup of the modified7-methylG DNA after extension and prior to cleavage.

Two reverse primers were used in a separate amplification reaction; onematched the natural sequence for the RFC gene (RFC), the other (RFC mut)introduced a base mutation (T to C) into the 66 base pair RFC sequence.The primers and corresponding products are also labeled RFC 4.4 and RFC4.4 Mut in some of the Figures herein.

Using PCR and the above two primers, 66 base pair fragments wereproduced (FIG. 1). The two fragments differ at one position, a T to Cchange in the biotinylated strand and an A to G change in thecomplementary strand (encoded by the two reverse primers). The PCRproducts were purified using streptavidin agarose and thenon-biotinylated strand from each PCR product was eluted and used as atemplate for primer extension. The biotinylated primer RFC bio wasextended on these templates in the presence of dATP, dCPT, dTTP and7-methyl dGTP. The extended products were purified using streptavidinagarose and then washed in the presence of alkali to remove thecomplementary strand not modified by 7-methyl-dGTP.

The streptavidin agarose-bound single-stranded DNA was then incubatedwith piperidine for 30 minutes at 90° C. to cleave at sites ofincorporation of 7-methylG into the modified DNA fragment. Thistreatment also resulted in the separation of the biotinyated fragmentfrom streptavidin. The reaction mixture was subjected to centrifugationand the polynucleotide-containing supernatant was transferred to a newtube. The DNA was dried in a speed vac and re-suspended in deionizedwater. This sample was then subjected to MALDI mass spectrometry.

FIG. 2 shows the molecular weights of the expected fragments of interestas a result of the cleavage of the biotinylated DNA strand at each siteof incorporation of 7-methylG. These fragments and their molecularweights are: a 27-mer (8772.15), a 10-mer (3069.92), an 8-mer (2557.6),and one of the following 10-mers depending on the reverse primer used inthe PCR reaction, RFC (3054.9) Pr RFC mut (3039.88). Thebiotinylated20-mer primer is also present because it was provided in excess in theextension reaction. The 10-mer fragments for RFC and RFC mut, whichdiffer by 15 Daltons, are the ones that should be detected and resolvedby mass spectrometry, thus revealing the point mutation.

FIG. 3 shows a denaturing polynucleotide sequencing gel analysis of theRFC and RFC mut Klenow polymerase extension fragments before and aftercleavage with piperidine. All the expected fragments were present inboth cases. Most of the additional minor bands are the result ofincomplete cleavage of the DNA strand by piperidine. Complete cleavagemay beachieved through two cycles of piperidine treatment using freshlydistilled piperidine for 30 minutes at 900 C with each cycle beingfollowed by drying and washing of the samples (data not shown). The bandfrom the RFC mut cleavage (lane 4 of FIG. 3) which runs between the8-mer and the 10-mer is the only band not explained by complete orincomplete cleavage.

FIG. 4 is the RFC mass spectrogram of the RFC sample. The peak on thefar right is the biotinylated primer band that was used as a standard tocalculate the molecular weights of all other bands. The left side of thespectrogram reveals all three expected cleavage bands (two 10-mers andan 8-mer). The insert in FIG. 4 is a magnified view of the regionsurrounding the two 10-mers and the 8-mer. The molecular weights in thisregion were all uniformly off by about 20 Daltons because the primerused for calibration was off by 20 Daltons. However, the massdifferences between the peaks were all exactly as predicted.

FIG. 5 shows the mass spectrogram and a magnified portion thereof fromthe RFC mut sample. Two peaks should remain the same between the RFC andRFC mut samples, one of the 10-mers (3089.67) and the 8-mer (2576.93).The molecular weight of the remaining 10-mer should be decreased in theRFC-mut 10-mer by 15.02 Da (from 3054.9 to 3039.88) due to the single Tto C switch and the mass difference between it and the unchanged RFC10-mer should be 30.04 (3039.88 vs. 3069.92). However, the massdifference actually obtained from the RFC mut was 319.73 Da. This mightbe due to a deletion of a C from the 10-mer corresponding to nucleotides57-66. This would also explain the anomalous 9-mer on the RFC mutsequencing gel (FIG. 3). For this to be so, the commercially obtainedprimer used in the amplification reaction would have to have beenmissing a G. The expected molecular weights for the RFC primer, the RFCmut primer and the RFC mut primer with a single G deletion are shown inTable 6. To test the hypothesis that an error had occurred in thesynthesis of RFC mut oligonucleotide primer, the RFC and RFC mutoligonucleotides were then combined and subjected to mass spectrometry.As can be seen from the mass differences obtained (FIG. 6 and Table 6),the hypothesis was correct, the RFC mut primer was indeed missing one G.

The power of the method of this invention is dramatically revealed inthe above experiment. What began as a controlled test of the methodusing a known sequence and a known nucleotide variance actually detectedan unknown variance in an unexpected place—the RFC mut primer.

Example 3 Variance Detection by Dinucleotide Restriction

A restriction enzymes that has a four base pair recognition site willcleave DNA specifically with a statistical frequency of one cleavageevery 256 (44) bases, resulting in fragments that are often too large tobe analyzed by mass spectrometry (FIG. 19A). Our chemical dinucleotiderestriction strategy, on the other hand, would result in much smallerfragments of the same polynucleotide. The average size of the fragmentsobtained is 16 (24) bases (FIG. 19B) which is quite amenable to massspectrometry analysis.

An example of this chemical restriction principle is illustrated in FIG.20. Depicted in this figure is a dinucleotide pair having aribonucleotide and 5′-aminonucleotides connected in 5′ to 3′orientation, thereby positioning the 2′-hydroxyl group of theribonucleotide in close proximity to the phosphoramidate linkage. Thechemical lability of the phosphoramidate linker is enhanced since thehydroxyl group can attack the phosphorous atom to form a 2′,3′-cyclicphosphate, resulting in the cleavage of DNA at this particulardinucleotide site.

Shown in FIG. 21 is an actual application of this approach. A 5′- ³²Plabeled 20 nt primer was extended with a mixture of Klenow (exo-) andE710A Klenow (exo-) polymerases using an 87 nt single stranded templatein a Tris buffer at pH9. The primer extension was performed with riboGTP(lane 1), 5′-aminoTTP (lane 3), or riboGTP/5′-aminoTTP (lane 5) in placeof the corresponding natural nucleotides. After the extension, thereaction mixtures were purified on a G25 column. The riboG-containingextension product was cleaved with aqueous base to generate a Gsequencing ladder (lane 2). The 5′-aminoT-containing product was, on theother hand, acid labile and was cleaved to afford a T sequencing ladder(lane 4). Under the conditions of the extension reaction withriboGTP/5′-aminoTTP (lane 5), a 64 nt product was obtained instead ofthe expected 87 nt. Interestingly, the 64 nt fragment is one of thedinucleotide cleavage products expected for GT restriction and the onlyone that should be visible by autoradiography. Acid cleavage of thisproduct produced a T ladder (lane 6) whereas base cleavage generated a Gladder (lane 7), indicating the successful incorporation of both ribbGTPand 5′-aminoTTP into the polynucleotide. From these results it can beconcluded that GT restriction cleavage had occurred during the extensionand/or workup procedures, most likely due to the synergized lability ofthe two modified nucleotides.

In order to visualize all three expected restriction fragments, the sameextension-cleavage experiment was performed in the presence of−³²P-dCTP. As shown in FIG. 22, three GT restriction fragments wereobserved with the expected relative mobility and specific radioactivity.

The versatility of this dinucleotide restriction approach isdemonstrated by AT restriction of the same DNA. Specific AT restrictionwas observed by polyacrylamide gel electrophoresis (PAGE) analysis (FIG.23). A similarly generated non-radioactive product was analyzed byMALDI-TOF mass spectrometry (FIG. 24). All the expected restrictionfragments were observed except for a 2 nt fragment that is lost duringG25 column purification.

The general applicability of this technology is further demonstratedwhen a longer, different DNA template was used (FIGS. 25 and 26). Primerextension with riboATP and 5′-aminoTTP followed by AT restrictiongenerated expected oligonucleotides as observed by PAGE analysis (FIG.25) or MALDI-TOF mass analysis (FIG. 26).

Example 4 Genotyping by Complete Substitution/Complete Cleavage

The following genotyping procedure by chemical restriction is anattractive alternative to other genotyping methods with many advantagesincluding increased accuracy and speed. In general, this method involvesPCR amplification of genomic DNA using chemically modified nucleotidesfollowed by chemical cleavage at the modified bases with the resultingamplicons. Shown in FIG. 27 is a schematic presentation of thistechnique. One of the primers (Primer 1) is designed to be close to thepolymorphic site of interest so that one of the polymorphic bases (e.g.,A) may be selected as the first cleavable nucleotide. After PCRamplification with the chemically modified nucleotide (supplemented withthe other three natural nucleotides), only one of the two alleles wouldbe cleavable at the polymorphic site. Treatment with chemical reagentswould afford cleavage products comprising Primer 1, whose length canreveal the genotype of the sample. Analysis by either mass spectrometryor electrophoresis can be implemented for identifying the expectedlength difference. Furthermore, mass spectrometry analysis may unmaskthe single base difference on the complementary strand of DNA thatcontains the polymorphism, providing a built-in redundancy and higheraccuracy.

Illustrated in FIG. 28 to 31 are the chemical cleavage and analysisprocedures utilized to genotype transferrin receptor (TR) gene. An 82 bpDNA sequence of TR gene was selected based on the location ofpolymorphism and efficiency of amplification (FIG. 28). The polymorphicbase (A or G) is positioned 3 bases from the 3′ end of Primer 1. For Aallele it is the first modified nucleotide to be incorporated; for Gallele, the first cleavable base is 6 bases from the primer. As aresult, fragments of different lengths are produced from chemicalcleavage. The PCR amplification reactions (50 μl each) were carried outin standard buffer with polymerase AmpliTaq Gold (0.1 unit/μl Cycler (MJResearch PTC-200) using 35 cycles of amplification (1 min denaturation,1.5 min annealing, and 5 min extension). Analysis of the PCR products ona 5% non-denaturing polyacrylamide gel (stained with Stains-All fromSigma) showed that 7-deaza-7-nitro-dATP can replace dATP for efficientPCR amplification (FIG. 28).

To the PCR products from 7-deaza-7-nitro-dATP were directly addedpiperidine, tris-(2-carboxylethyl)phosphine (TCEP), and Tris base to afinal concentration of 1 M, 0.2 M, and 0.5 M, respectively, in a totalvolume of 100 l. After incubation at 950 C for 1 hour, 1 ml of 0.2 Mtriethylammonium acetate (TEAA) was added to each reaction mixture andthe resulting solution purified on an OASIS column (Waters). The elutedproducts were concentrated to dryness on Speedvac and the residueanalyzed by mass spectrometry or electrophoresis. FIG. 29 shows thesequences of selected fragments expected from cleavage at7-deaza-7-nitro-dA. The sequences are grouped according to lengths andmolecular weights. The first group contains longer fragments that areextended from primers. The 22 nt fragment is an invariant fragment,which may be used as an internal reference. The 25 nt or 28 nt fragmentis expected from A or G allele, respectively. The shaded group ofsequences are from the complementary strand of DNA, including invariant13 nt and 11 nt fragments that can be used as internal references and apair of 11 nt fragments expected from two allelic forms of TR gene witha 15 Da mass difference. Shown in FIG. 30(a) is a MALDI-TOF spectrum ofchemically cleaved products from an 82 bp heterozygote TR DNA sample.Highlighted in the spectrum are the two regions that contain fragmentsdepicted in FIG. 29.

Each purified cleavage sample was mixed with 3-hydroxypicolinic acid andsubjected to MALDI-TOF analysis on a Perceptive Biosystems Voyager-DEmass spectrometer. Mass spectra in the region of 7000-9200 Daltons wererecorded and the results for the three TR genotypes are shown in FIG.30(b). The spectra were aligned using the peak representing invariant 22nt fragment (7189 Da). Two additional peaks were observed for AGheterozygote sample with one corresponding to A allele (8057 Da) and theother G allele (9005 Da). As expected, only one additional peak wasobserved for GG or M homozygote samples, each with the molecular weightof cleavage fragments from G or A allele. FIG. 31(a) shows a massspectrum of AG heterozygote sample in the region of 3700-4600 Da. With3807 Da and 4441 Da fragments as internal references, the genotype ofthis sample was confirmed through the observation of two peaks in themiddle of the spectrum with 15 Da mass difference. The molecular weightsobserved by mass spectrometry indicated that phosphate-deoxyribose-TCEPadducts were uniformly formed during the cleavage reaction, resulting infragments that are modified at 3′ end (FIG. 31 (b)). The data shown inFIG. 30 and FIG. 31 also illustrated that the combination of chemicalrestriction with mass spectrometry can provide corroborating genotypinginformation from both strands of DNA, thereby assuring the accuracy ofthe analysis.

Alternatively, the chemically restricted samples may be analyzed byelectrophoresis to detect the diagnostic length difference resultingfrom the two alleles. Capillary electrophoresis (CE) analyses wereperformed using a homemade instrument with a UV detector and a capillarycontaining denaturing linear polyacrylamide gel. FIG. 32(a) shows the CEchromatogram obtained from TR samples of various genotypes. Aspredicted, each genotype showed distinguished elution patterncorresponding with the lengths of expected cleavage products. Whereas AAhomozygote produced a 25 nt fragment and GG homozygote generated a 28 ntfragment, AG heterozygote sample afforded both 25 nt and 28 nt products.After being labeled at 5′ end by ³²P, the cleavage samples weresubjected to PAGE analysis. The resulting autoradiogram in FIG. 32(b)demonstrates that the cleavage is specific with little or no backgroundand the genotyping results are unambiguous.

Another alternative detection method involves the application offluorescence resonance energy transfer (FRET). FRET has beensuccessfully applied for polymorphism detection by TaqMan assays (ToddJ. A., et al. 1995, Nature Genetics, 3:341-342) and Molecular Beacons(Tyagi, S. et al. 1998, Nature Biotechnology, 16:49-53). However, whenlonger probes are necessary to achieve their hybridization to targetsequences (e.g., AT rich sequences), it becomes increasingly difficultto distinguish the vanishingly small difference resulted from a singlenucleotide mismatch. The advantage of chemical restriction in thisregard is illustrated in FIG. 33. Similar to the aforementioned example,a modified nucleotide analog of one of the polymorphic base (e.g., A) isused in place of its natural counterpart in the PCR amplification.Primer 1 is designed to be close to the polymorphic site so that thepolymorphic base A would be the first cleavable nucleotide for A allele.Primer 1 is also labeled with a fluorescent group (F1) positioned closeto 3′ end (FIG. 33(a)). After amplification and chemical restriction, aprobe covalently attached to another fluor F2 (shown in FIG. 3(b)) canbe added and the FRET effect between the two fluorophores measured.Because one of alleles was cleaved closer to the 3′ end of primer 1 thanthe other, the difference between them in hybridization is expected tobe greater than a single nucleotide mismatch, and may be exploited todistinguish the two allelic targets. As depicted in FIG. 33(c), theexperimental temperature can be attenuated so that only the longerfragment from G allele can hybridize with the probe, resulting in FRET.Since in this system a “NO FRET” result could be interpreted either asallele A or failed PCR amplification, it is necessary to measure thefluorescence of each sample at various temperatures to ensure thepositive detection of the shorter fragment from allele A at a lowertemperature. Alternatively, this positive detection may be achievedthrough the use of a hair-pin probe as depicted in FIG. 33(d). The probehas a 5′ end tail that folds back to form a hairpin, in addition toafluoro F3 at the 5′ end. With the short cleavage fragment from Aallele, the hairpin probe can form a bridging duplex as depicted,generating detectable FRET between F1 and F3. Only with the longerfragment from G allele can the inter-strand hybridization compete withthe stability of the hairpin and result in loss of FRET between F1 andF3.

Example 5 Complete Sequencing by Partial Substitution/Partial Cleavage

Using the following procedure, it is entirely possible to sequence, inone set of sequencing reactions, a polynucleotide consisting of 10,000,20,000 or even more bases by polymerization in the presence of modifiednucleotides, enzymatic restriction of polymerization products,purification of restriction fragments and chemical degradation toproduce sequence ladders from each fragment. The procedure is limitedonly by the size of the template and the processivity (the ability tocontinue the polymerization reaction) of the polymerase used to extendthe primer. Unlike a shotgun cloning library in which there is a normaldistribution of sequence inserts requiring highly redundant sequencing,using the method describe herein results in each nucleotide beingsampled once and only once. Repeating the procedure using a second oreven a third restriction enzyme cocktail will provide the sequenceinformation needed to reassemble the sequences determined from theinitial restriction in the proper order to reconstruct the full lengthpolynucleotide sequence while also supplying the redundancy necessary toensure the accuracy of the results. In the description which follows avariety of options for carrying out each step are provided. As before,it is understood that other modifications to the procedure describedwill be readily apparent to those skilled in the art; such othermodifications are within the scope and spirit of this invention. TABLE 6Primer Molecular Weight Mass Difference RFCC 6099.6 RFC mut 6115.9 +16RFC mut 5786.7 −313.2a. Anneal Primer and Template

The template used may be a small or a large insert cloning vector or anamplification product such as a PCR fragment; it may also be single- ordouble-stranded. For example, without limitation, the template may be aplasmid, phagemid, cosmid, P1, PAC, BAC or YAC clone. The template isideally rendered linear before extension to ensure that all extensionproducts terminate at the same place. This can be accomplished byrestricting the template with a restriction endonuclease. For example,the templates may be prepared in a vector that has restriction sites forone or more rare cutters on either side of the cloning site so that alinear template can be routinely prepared by restriction using the rarecutter enzyme (i.e., an enzyme that cleaves, for example, a 7 or 8nucleotide motif). Many plasmid vectors such as, without limitation,Bluescript (Stratagene, Inc.) have these features. A primer can beselected which will anneal to a sequence in the vector, for example, theM13 universal primer sequences. This allows the sequencing of a libraryof clones using only one or two primers (one from each side of theinsert). Alternative, a series of insert-specific primers may be used(at approximately 5-20 kb intervals) in a version of primer walking.

b. Extend Primer in Presence of All Four Natural Deoxyribonucleotidesand a Modified Nucleotide Corresponding to One of the NaturalNucleotides.

The procedures discussed above are used to extend the primer over theentire length of the template using one of the modified nucleotidesdescribed above or any other modified nucleotide which is capable ofimparting selective cleavage properties to the modified polynucleotide.In general, the ratio of modified nucleotide to its natural counterpartcan vary over a considerable range from very little (approximately 1%)to complete (≧99%) substitution. The controlling factor is theefficiency of the subsequent chemical cleavage reaction. The moreefficient the cleavage reaction, the lower the level of incorporationcan be. The goal is to have approximately one modified nucleotide perrestriction fragment so that, after cleavage, each molecule in thereaction mixture contributes to the sequencing ladder. FIG. 7 shows onesuch modified polynucleotide, a linearized, single-stranded M13 templateextended to 87 nucleotides in the presence of the modified nucleotide,5′-amino dTTP using the exo-minus Klenow fragment of E. coli DNApolymerase. FIG. 9 shows a 7.2 Kb extension product, again produced froman M13 template in the presence of 5′-amino-dTTP and dTTP at a molarratio of 100:1 (Panel A, extension product).

c. Purify the Full Length Primer Extension Product (Optional)

In order to eliminate prematurely terminated (i.e., less than fulllength) polymerase extension products, thereby assuring a homogeneoussequencing ladder on electrophoresis after cleavage, it may be desirableto purify the full length or substantially full length extensionproducts. It is noted, however, that the purification of the restrictionfragments after digestion (step f, below) achieves essentially the samegoal and, in most instances, is likely to suffice. In any event, theelimination of short extension products can be accomplished by numerousprocedures known in the art such as spun column chromatography or highperformance liquid chromatography (HPLC). FIG. 8 shows a purifiedfull-length extension product before (Panel A) and after (Panel B)chemical cleavage with acid.

d. Cleave the Primer Extension Product with, One or More RestrictionEnzymes.

As noted previously, the optimal size for DNA sequencing templates (inthis case, of restriction products) is approximately 300 to about 800nucleotides when gel electrophoresis is to be used for the creation ofthe sequencing ladder. Thus restriction endonucleases must be employedto reduce the full-length extension product of 10 Kb or more tomanageable size. Numerous such endonucleases are known in the art. Forexample, many four-base restriction endonucleases are known and thesewill generally yield restriction products in the desired range. Shorterrestriction fragments; e.g., less than 300 nucleotides, can also besequenced, but to make the most efficient use of gel runs, it isdesirable to separate the restriction fragments into sets according totheir length. The shorter fragments will then require relatively briefsequencing run times while the longer fragments will require a longergel and/or longer run times. Two or more restriction endonucleasecocktails, each containing one or more restriction endonucleases and acompatible buffer, can be used to provide the overlapping sequenceinformation necessary to re-assemble the complete sequence of thepolynucleotide from the restriction fragments. FIG. 9 shows an exemplaryrestriction endonuclease digestion of a primer/template complex extendedin the presence of dTTP and the modified nucleotide 5′-amino dTTP. Ascan be seen in FIG. 9, complete cleavage was obtained using therestriction endonuclease Msc I. Other MSC I restriction products are notseen because only the 5′ end of the primer extension product was labeledwith ³²P.

e. Label the Restriction Endonuclease Products.

To visualize the DNA sequencing ladder generated by this method, it isnecessary to label the restriction endonuclease products with adetectable label. Many such labels are known in the art; any of them maybe used with the methods of this invention. Among these are, withoutlimitation, radioactive labels and chemical fluorophores. For instance,³⁵SdATP (Amersham Phamacia Biotech, Inc) or rhodamine-dUTP (MolecularProbes) can be incorporated at the primer extension step. Alternatively,the DNA can be labeled after restriction by modification of therestriction fragments ends by, without limitation, T4 polynucleotidekinase or filling recessed ends with a DNA polymerase and a labelednucleotide. Such end labeling is well known in the art (see, forexample, Ausubel, F. M., et al., Current Protocols in Molecular Biology,John Wiley & Sons, 1998). End labeling has the advantage of putting onemolecule of label on each DNA fragment that will afford homogenoussequencing ladders. Labeling of the template strand is of no consequencesince it will not be cleaved during the chemical cleavage reaction dueto the absence of modified nucleotide in its sequence. Thus, nosequencing ladder will be produced for the template strand.

f. Separate the Labeled Restriction Endonuclease Products.

The restriction fragments must be separated prior to chemical cleavage.Numerous methods are known in the art for accomplishing this (see, forexample, Ausubel, F. M., op. cit.). A particularly useful technique isHPLC, which is rapid, simple, effective and automatable. For example,FIG. 10 shows the resolution obtained by HPLC on Hae III restrictedPhiX174 DNA. Ion reverse pair phase HPLC and ion exchange HPLC are twopreferred methods of separation.

g. Cleave the Separated Labeled Restriction Endonuclease Fragments atSites of Modified Nucleotide Incorporation.

Depending on the modified nucleotide incorporated, use one of thecleavage reactions previously described herein or any other cleavagereaction which will selectively cleave at the site of incorporation ofthe modified nucleotide, such other cleavage reactions being within thescope and spirit of this invention.

h. Determine the Sequence of the Fragment.

FIG. 11 shows the sequence ladder obtained from a polynucleotide inwhich T has been replaced with 5-amino T. This ladder, of course, onlyreveals where T occurs in the complete sequence of the targetpolynucleotide. To obtain the entire sequence, the above procedure wouldbe repeated three more times, in each case one of the remainingnucleotides, A, C and G would be replaced with a corresponding modifiednucleotide; e.g., 5′-amino-dATP, 5′-amino-dCTP or 5′-amino-dGTP. Whenall four individual fragment ladders are in hand, the complete sequenceof the polynucleotide can easily be re-constructed by analysis andcomparison of gel sequencing data.

Example 6 Complete Sequencing by Substantially CompleteSubstitution/Substantially Complete Cleavage Combined with MassSpectrometry.

The preceding procedure for complete sequencing of a polynucleotidestill requires the use of gel electrophoresis for creating fragmentladders from which the sequence is read. As noted previously, gelelectrophoresis is a time and labor intensive process which alsorequires a fair degree of skill to carry out in such a manner as to havea reasonable assurance of reproducible and accurate results. It is anaspect of this invention that the use of gel electrophoresis can beeliminated completely and replaced with relatively simple to use, fast,sensitive, accurate, automated mass spectrometry. The basis for thisaspect of this invention is the previously discussed uniqueness in themolecular weights of virtually all 2-mers through 14-mers with theexception of the 8 fragment pairs described above (and other fragmentpairs that are based on addition of identical sets of nucleotides to the8 fragment pairs. The following is an example of how this procedurewould be carried out. While the example is described in terms of humanintervention and specific analyses at each step, it will be readilyapparent to those skilled in the art that a computer program could bedevised to completely automate the analytic procedure and furtherincrease the speed of this aspect of this invention. The use of such acomputer program is, therefore, within the scope and spirit of thisinvention.

The procedure for determining complete nucleotide sequences by massspectroscopy would entail the following steps:

-   -   a. substantially complete replacement of a natural nucleotide in        a polynucleotide with a modified nucleotide to form a modified        polynucleotide. This would be accomplished by an amplification        procedure or by primer extension employing the polymerase        reaction discussed above. Optionally, the procedure disclosed        above could be used to arrive at the optimal polymerase or set        of polymerases for preparing the desired modified        polynucleotide;    -   b. cleavage of the modified polynucleotide under conditions that        favor substantially complete cleavage at and essentially only at        the points of incorporation of the modified nucleotide in the        modified polynucleotide; and,    -   c. determination of the masses of the fragments obtained in the        preceding cleavage reaction.

The above three steps are then repeated three more times, each time adifferent modified nucleotide corresponding to each of the remainingnatural nucleotides is used. The result will be a series of masses fromwhich all or most of the sequence of the entire original polynucleotidecan be ascertained. Any sequence ambiguity that remains after the mainanalysis is done should be readily resolved by using one more reactionsinvolving a contiguous dinucleotide substitution/cleavage reaction or bya conventional DNA sequencing procedure. The following is an example ofhow the analysis of a fragment would proceed.

Given the 20 nucleotide natural oligomer extended from a 16 mer primer,5′-primer-TTACTGCATCGATATTAGTC-3′, polymerization in the presence ofdTTP, dCTP, dGTP and a modified dATP will result, after substantiallycomplete cleavage, in five fragments whose masses are shown in Table 7.Carrying out the procedure three more times for the remaining threenatural nucleotides will result in three more sets of fragments, themasses of which are also shown in Table 7. From these masses, thenucleotide content (but not sequence, yet) of all the fragments can beuniquely determined. The actual sequence is determined by analyzing allfour cleavage results together.

For example, looking at the masses of all the fragments in Table 1, itis readily discernable that only one mass in each cleavage set comprisesmore than 16 nucleotides, that all the other fragments are 3′ of theprimer (since the fragment containing the primer must be at least 16 nt)and that there are two nucleotides after the primer in the A cleavagecolumn, three in the C column, five in the G column and none in the Tcolumn. Therefore, the sequence must begin with TT followed by an A,then a C, an unknown nucleotide and then a G. The sequence must startwith 2 T residues because neither A, C nor G cleavage occurs in thisinitial interval. Also, by adding the masses of the fragments in thedifferent cleavage sets, it can be seen that the length the unsequencedregion is 20 nucleotides. The number of nucleotides in of the fourcleavage sets are also readily ascertainable—set A:(primer+2)+5+4+3+2=16; set C: (primer+3)+10+3+3+1=20, set G:(primer+5)+7+5+3=20; set T: 4+3+3+2+2+1=15. From this information it isclear that there must be overlapping fragments in the A and T sets.

Subtracting the known mass of the primer from those fragments containingthe primer reveals the nucleotide content of the sequence immediatelyfollowing the primer. Thus, in lane A, the residual mass of 608 Daltonswhich, from Table 3, is seen to correspond to TT which therefore must bethe first two nucleotides in the unknown fragment sequence. The sequencefollowing the primer is thus already known to be TTAC G. From the massof the 5 mer in the G lane (1514 Daltons), it can be seen that the 5-mercontains three Ts, an A and a C. Thus, the missing nucleotide must be aT; the leading sequence is TTACTG. TABLE 75′-Primer-TTACTGCATCGATATTAGTC-3′ [SEQ. ID. No. 1] A Mass C Mass G MassT Mass Cleave at Primer − TT 608 + primer − TTA 921 + primer − 1514 +primer primer modified: primer primer TTACT primer only ACTGC 1463 CTG861 GCATC 1463 T 304 ATCG 1174 CAT 845 GATATTA 2119 TAC 845 Cleavage AT556 CGATATTAGT 3041 GTC 861 TGCA 1174 fragments ATT 860 C 289 TCGA 1174listed in AGTC 1174 TA 556 5′-3 T 304 order TAG 885 TC 532

Table 7 shows the nucleotide-specific cleavage patterns for the sequenceshown at top, which consists of a primer of known sequence and length(not specified) followed by 20 nucleotides of ‘unknown’ sequence.Cleavages in this example occur via a mechanism that breaks thephosphodiester bond 5′ of the modified nucleotide. Each cleavage setincludes one fragment containing the primer plus however manynucleotides after the primer until the first occurrence of the modifiednucleotide. The known mass of the primer can be subtracted from this(largest) mass to obtain the difference, which gives the mass andtherefore the nucleotide content of the sequence immediately 3′ of theprimer. The masses provided in the table reflect the presence of oneexternal phosphate group in each cleavage mass, however it should berecognized that, depending on the chemical nature of the nucleotidemodification and the cleavage reaction, actual masses will likely differfrom those shown in the table. However, such differences are expected tobe systematic and therefore do not invalidate the analysis.

Turning now to the masses shown in the T lane of Table 0.7., the 906.Dalton mass must contain a T, an A and a C. Since the already is a TACsequence known, it may tentatively be held that this is a confirmingsequence, part of the overlap of the A and T cleavages. It, of course,cannot yet be ruled out that another 3-mer containing T, A and C existin the fragment which is why this assignment must remain tentative atthis point.

The next T cleavage fragment must, at a minimum, contain a T and a G.Two T cleavage masses permit this: 946 and 1235. Thus, the additionalsequence must be either G followed by T (if the 946 mass is the nextmass) or G followed by a C and an A, order not known, and then T. Thesequence is now known to be either TTACTGGT or TTACTG(C,A)T (theparentheses and comma between nucleotides will be used to indicateunknown order).

Going back to the A cleavage reaction, it can be seen that the nextcleavage mass after the TT must contain ACTG. Two masses, 1235 Da and1524 Da, meet this criterion. If 1235 Da is correct, the seventhnucleotide in the sequence is A because cleavage has to have occurred atthat nucleotide. If 1524 Da is correct, then the sequence is CA. CA isconsistent with one of the two possibilities discussed above; thus theoverall sequence so far is TTACTGCAT.

Looking next at the masses from the C cleavage reaction, it can be seenthat the first mass after the initial TTA must be CTG(C, A). Sincecleavage will occur 5′ of any C, the possibilities are CTG or CTGA; onlythe first of these is supported by the masses in the C lane. Thus thesecond mass fragment in the C lane must be CTG followed by another C(because cleavage has occurred at that point). The third mass in the Clane (906 Da) must contain a C, an A and a T, which confirms theprevious sequence of CAT. This leaves only two possibilities for theremaining sequences, a C followed by the 10 mer or the 10 mer followedby a terminal C. However, if the former were the case, then a cleavagefragment from one of the other lanes, A, G, or T, should show a 3 mer, 4mer or 5 mer which contains 2 Cs. Since none of the masses permit suchan oligomer, the lone C must be at the 3′ end of the unknown fragmentand the 10 mer is next after CAT giving the following sequenceTTACTGCATC_ _ _ _ _ _ _ _ _ C.

Turning once again to the G cleavages, it is now known that a fragmentmust exist which contains at least GCATC. From the masses available thismay be GCATC itself (1524 Da) or the 7 mer (2180 Da). However, if themass of the 5 mer is subtracted from the mass of the 7 mer, theremaining mass, 656 Da, does not correspond to any knownoligonucleotide. Thus, the 7 mer cannot be next, GCATC is thecorrectsequence and the next nucleotide must be a G (since cleavage hasoccurred to give the 5 mer). The sequence is now TTACTGCATCG_ _ _ _ _ __ _ C.

The next mass in the T cleavage series must being with TCG. The only Tcleavage mass which permits such a combination is 1235 Da whichcorresponds to a TCGA sequence. This sequence must be followed by a Tsince cleavage has occurred at that point. The overall sequence is,therefore, TTACTGCATCGAT_ _ _ _ _ _ C.

There is only one mass among the available T cleavage series whichcontains a C, the 593 Da TC. Thus the nucleotide preceding the terminalC must be a T. Likewise, the only TC-containing mass in the A cleavageseries that does not contain 2 Cs, which is now known to be notpermissible, is 1235 or (A, G)TC. The 1235 mass has already been usedonce (nucleotides 8-11) but it is also known that there is fragmentoverlap since the A series only accounts for a total of 16 nucleotides.The sequence is now known to be TTACTGCATCGAT_ _ _ (A, G)TC. However, ifthe terminal sequence is ATC, there should be a 906 Da mass among the Acleavages; there is not. On the other hand, if the terminal sequence isGTC, a mass of 922 Da should be found among the G cleavage fragments andthere is. Thus, the sequence can now be established as TTACTGCATCGAT_ __ AGTC.

There is only one available T cleavage mass containing AG but no C, the946 Da mass consisting of T(A, G). This mass must account for the AG inpositions 17 and 18. Therefore, position 16 must be a T; the sequence isnow known to be TTACTGCATCGAT_ _ TAGCT.

Only two masses are still available in the A cleavage group, 617 (AT)and 921 (ATT). These complete the overall sequence in two ways, ATATT orATTAT. None of the masses permits the resolution of this ambiguity.However, all 20 nucleotides in the target oligonucleotide have, in asingle experiment, been unambiguously identified and 18 of the 20 havebeen unambiguously sequenced.

With regard to ambiguity generally, be it be one, as in the aboveexample, or more than one, as might be the case when sequencing longerfragments, depending on the nature of the ambiguity and the environmentit which it exists; i.e., the nucleotides on either side of it, anadditional experiment using any one of several available proceduresshould readily resolve the matter. For instance, an experiment using thedinucleotide cleavage method of this invention might provide theadditional information necessary to resolve the ambiguity.Alternatively, some relaxation of the substantially complete cleavageconditions might result in a ladder of masses in which a known mass isjoined with an adjacent ambiguous mass in a manner that clarifies theposition and order of the ambiguous mass with respect to the known mass.Or, low accuracy, single pass Sanger sequencing might be employed.Alone, this relatively easy and rapid version of Sanger sequencing wouldnot provide much valuable information but, as a complement to the methodof this invention, it would likely provide sufficient information toresolve the ambiguity (and, to the extent the sequencing ladder obtainedis unambiguously readable it would provide a partial redundancyverifying the mass spec data.

Example 7 Simultaneous Incorporation of Modified Nucleotides andFluorescently Labeled Nucleotides in Amplified Segments.

The following example demonstrates the ability to simultaneouslyincorporate both modified nucleotides and fluorescent nucleotides into aDNA segments during PCR amplification. It is also a demonstration of theability to cleave the PCR products following amplification at themodified nucleotides resulting in smaller fluorescent labeled fragmentsamenable to genotyping by hybridization. Five reactions were set up for7-nitro-7-deaza-dATP and five reactions for 5-hydroxy dCTP. The volumefor the components in each of the reactions are listed below inmicroliters (μL). Some of the reagents were available commercially,namely, 10× PCR buffer (Gibco-BRL 11495-017 part no. 52395); 10×enhancer (Gibco-BRL 0.11495-017 part no. 52391); 1 mM fluorescein12-dUTP (Molecular Probes, C-7604); and cloned Pfu polymerase 2.5 U/μL(Stratagene 600159). Reaction number Reagents 1 2 3 4 5 6 7 8 9 10 10 ×PCR Buffer 2 2 2 2 2 2 2 2 2 2 10 × Enhancer 5 5 5 5 5 0 0 0 0 0 50 mMMgSO₄ 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 20 μM 2D6-4554-CF- 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 NEW primer 20 mM 2D6-4554-LR 0.5 0.5 0.50.5 0.5 0.5 0.5 0.5 0.5 0.5 primer 20 ng/mL Genomic DNA 1 1 1 1 1 1 1 11 1 25 mM dGTP, dCTP, dTTP 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 25 mM7-nitro-7deaza- 0.2 0.2 0.2 0.2 0.2 0 0 0 0 0 dATP 25 mM 5-OH-dCTP 0 0 00 0 0.2 0.2 0.2 0.2 0.2  1 mM Fluorescein 12- 0 1.7 1 0.7 0.5 0 1.7 10.7 0.5 dUTP Cloned pfu polymerase 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.80.8 Deionized water 9 7.3 8 8.3 8.5 14 12.3 13 13.3 13.5

The ratios of fluorescein 12-dUTP to dTTP in reactions 2, 3, 4, and 5above were approximately 1:3, 1:5, 1:7, and 1:10 respectively. Thesequence amplified by PCR using the designated primers corresponds tobases 4533 to 4713 in the cytochrome P450 2D6 gene.

The reactions were cycled on a MWG Biotech Primus 96^(Plus) thermocyclerusing the following parameters: Step Temperature Time No. of Cycles 194° C.  2 min  1 cycle 2 94° C. 15 sec Steps 2-4 3 55° C. 15 sec. 45cycles 4 72° C.  2 min. 5 72° C.  7 min  1 cycle 6  4° C. indefinitelyhold5 μL of each sample was removed, mixed with loading buffer and separatedby electrophoresis on a 2% agarose gel. The reaction number correspondsto the lane number. The gel was placed on a UV transilluminator andphotographed using a Polaroid MP4 camera (FIG. 45).

A green fluorescence could be detected in all the fragments (wells 2-5and 7-10) containing fluorescein 12-dUTP but not in the control wellswhich were amplified with modified nucleotides but without fluorescein12-dUTP (wells 1 and 6). Fluorescence in the control wells (wells 1 and6) which can been seen in the photograph in FIG. 45 was an orangefluorescence indicating that it was due to trace amounts of ethidiumbromide in the gel. This demonstrates that the fluorescein 12-dUTP canbe incorporated in the fragment during PCR amplification in the presenceof 100% substitution of either 7-nitro-7-deaza-dATP for dATP or5-hydroxy-dCTP for dCTP.

Following the taking of the photograph in FIG. 45 the agarose gel wasstained with ethidium bromide and photographed to visualize thenon-fluorescent labeled PCR fragments (wells 1 and 6, FIG. 46). Ethidiumbromide staining demonstrates that the intensities of the PCR fragmentsare approximately the same whether amplified in the presence offluorescent nucleotides (wells 2-5. 7-10) or in their absence (wells 1and 6) indicating that incorporation of the fluorescent nucleotide doesnot inhibit the PCR reaction.

The following reaction was set up to determine whether a PCR reactioncontaining modified 5-hydroxy-dCTP and fluorescein 12-dUTP could becleaved to form smaller labeled fragments. All the volumes are in μL. A.10 × PCRx buffer 8 B. 50 mM MgSO₄ 3.2 C. 20 uM 2D6-4554-CF-NEW primer 2D. 20 uM 2D6-4554-LR primer 2 E. 20 ng/uL Genomic DNA 4 F. 25 mM dATP,dGTP, dTTP 0.8 G. 25 mM 5-OH-dCTP 0.8 H.  1 mM Fluorescein-12-dUTP 6.8I. cloned Pfu polymerase 2.5 U/μL 3.2 J. deionized water 49.2

The sequence amplified is shown below in FIG. 47 with primersunderlined, modified nucleotides indicated with an “m” above thenucleotide on the forward strand and below the nucleotide in the reversestrand, and potential fluorescein dU labeled nucleotides with a “*”above the nucleotide on the forward strand and below the nucleotide onthe reverse strand. The sequence corresponds to a region of thecytochrome P450 2D6 gene from nucleotides 4533-4713.

The reactions were cycled on a MWG Biotech Primus 96^(Plus) thermocyclerusing the following parameters: Step Temperature Time No. of Cycles 194° C.  2 min  1 cycle 2 94° C. 15 sec Steps 2-4 3 55° C. 15 sec. 45cycles 4 72° C.  2 min. 5 72° C.  7 min  1 cycle 6  4° C. indefinitelyholdThe reaction was purified over a Sephadex G50 spin column to remove thefluorescein 12-dUTP, which would interfere with the analysis on theABI377. The following protocol was used for the purification procedure:

-   A. re-suspend the resin in the Sephadex G50 spin column.-   B. Remove the cap at the top and then the cap at the bottom of the    Sephadex G50 spin column and let drain by gravity.-   C. Spin the Sephadex G50 spin column in a Beckman TJ-6R centrifuge    for 2 min. at 2000 rpm (1100×g).-   D. Spin the Sephadex G50 spin column in a Beckman TJ-6R centrifuge    one more time for 1 min. at 2000 rpm (1100×g) to remove the residual    liquid in the tip.-   E. Load the sample onto the Sephadex G50 spin column and spun in a    Beckman TJ-6R centrifuge at 2000 rpm (1100×g) for 4 min.    The sample was dried in a Savant ISS 100 SpeedVac for 2 hours at    high heat. The sample was then re-suspended in 16 μL of 10 mM Tris    HCl pH 7.5. 1 μL of 10 mM K₂MnO₄ was added to the reaction, the    sample was mixed by vortexing and centrifuged in an Eppendorf 5415C    microcentrifuge for 5 seconds. The reaction was incubated for 5    minutes at room temperature. After incubation, 2.6 μL of 7.4 M    pyrrolidine/38.5 mM EDTA was added to the tube, the sample was mixed    by vortexing and centrifuged in an Eppendorf 5415C microcentrifuge    for 5 seconds. The reactions were incubated at 94° C. for 1 hour in    an MJ Research PTC100 thermocycler.

An aliquot of the sample, 3 μL, was mixed with 23 μL of loadingdye-containing Rox-labeled size standards of 10, 20, 30, 40 and 50bases. 0.8 μL of sample with dye was loaded on a 15% Long Rangeracrylamide gel and electrophoresed on an ABI 377 sequencer. The run wasanalyzed using GeneScan analysis software. FIG. 48 shows thechromatogram of the ABI 377 run with the expected labeled 23 mer and 34mer generated during chemical cleavage of the amplified PCR product.

The above data demonstrate that both modified nucleotides andfluorescent nucleotides can be incorporated simultaneously during PCRamplification. It also demonstrates that the PCR fragments can besubsequently cleaved at the modified nucleotides generating smallerfluorescent labeled fragments that are amenable to genotyping byhybridization.

CONCLUSION

Thus, it will be appreciated that the method of the present inventionprovides versatile tools for the detection of polymorphism inpolynucleotides.

Although certain embodiments and examples have been used to describe thepresent invention, it will be apparent to those skilled in the art thatchanges in the embodiments and examples shown may be made withoutdeparting from the scope and spirit of this invention.

Other embodiments are within the following claims.

1. A method for detecting polymorphism in a polynucleotide, comprising:providing a polynucleotide suspected of containing a polymorphism;amplifying a segment of the polynucleotide encompassing the suspectedpolymorphism wherein amplification comprises replacing one or morenatural nucleotide(s), one of which is a nucleotide involved in thesuspected polymorphism, at substantially each point of occurrence in thesegment with a modified nucleotide or, if more than one naturalnucleotide is replaced, with different modified nucleotides to form anamplified modified segment; cleaving the amplified modified segment intofragments by contacting it with a reagent or reagents that cleave(s) thesegment at substantially each point of occurrence of the modifiednucleotide(s); hybridizing the fragments to an oligonucleotide; and,analyzing the hybridized fragments for an incorporated detectable labelidentifying the suspected polymorphism.
 2. The method of claim 1,wherein the detectable label is incorporated during amplification. 3.The method of claim 2, wherein incorporating the detectable label duringamplification comprises using a detectably labeled primer.
 4. The methodof claim 3, wherein the detectably labeled primer comprises aradioactive primer or a primer containing a fluorophore.
 5. The methodof claim 1, wherein incorporating the detectable label duringamplification comprises using a detectably labeled, modified nucleotide.6. The method of claim 5, wherein the detectably labeled, modifiednucleotide comprises a radioactive modified nucleotide or a modifiednucleotide containing a fluorophore.
 7. The method of claim 5, whereinthe detectably labeled, modified nucleotide is a detectably labeled,modified ribonucleotide.
 8. The method of claim 7, wherein thedetectably labeled, modified ribonucleotide comprises a radioactivemodified ribonucleotide or a modified ribonucleotide containing afluorophore.
 9. The method of claim 1, wherein incorporating thedetectable label during amplification comprises replacing a naturalnucleotide, that is different than the natural nucleotide(s) beingreplaced with a modified nucleotide(s), at one or more point(s) ofoccurrence in the segment with a detectably labeled nucleotide.
 10. Themethod of claim 9, wherein the detectably labeled nucleotide comprises aradioactive nucleotide or a nucleotide containing a fluorophore.
 11. Themethod of claim 9, wherein the detectably labeled nucleotide comprises adetectably labeled ribonucleotide.
 12. The method of claim 11, whereinthe detectably labeled ribonucleotide comprises a radioactiveribonucleotide or a ribonucleotide containing a fluorophore.
 13. Themethod of claim 1, wherein the detectable label is incorporated duringcleavage.
 14. The method of claim 13, wherein incorporating thedetectable label during cleavage comprises using detectably labeledtris(carboxyethyl)phosphine (TCEP).
 15. The method of claim 14, whereinusing detectably labeled TCEP comprises using radioactive TCEP or TCEPcontaining a fluorophore.
 16. The method of claim 13, whereinincorporating the detectable label during cleavage comprises using adetectably labeled secondary amine.
 17. The method of claim 16, whereinusing a detectably labeled secondary amine comprises using a radioactivesecondary amine or a secondary amine containing a fluorophore.
 18. Themethod of claim 1, wherein the detectable label is incorporated duringhybridization.
 19. The method of claim 18, wherein incorporating thedetectable label during hybridization comprises hybridizing a second,detectably labeled oligonucleotide to the fragments hybridized to theoligonucleotide.
 20. The method of claim 19, wherein the second,detectably labeled oligonucleotide comprises a radioactiveoligonucleotide or an oligonucleotide containing a fluorophore.
 21. Themethod of claim 1, wherein the detectable label is incorporated aftercleavage or after hybridization, the method comprising: cleaving using areagent comprising TCEP or a secondary amine; and, substituting the TCEPor secondary amine with a radioactive molecule or a fluorophore aftercleavage or after hybridization.
 22. The method of claim 1, wherein thepolymorphism is selected from the group consisting of a singlenucleotide polymorphism (SNP), a deletion or an insertion.
 23. Themethod of claim 1, wherein amplifying the segment comprises a polymerasechain reaction (PCR).
 24. The method of claim 1, wherein amplifying thesegment comprises replacing one natural nucleotide that is involved inthe suspected polymorphism at each point of occurrence in the segmentwith a modified nucleotide to form a modified segment.
 25. The method ofclaim 24, wherein the modified nucleotide comprises a labeled, modifiednucleotide.
 26. The method of claim 25, wherein the labeled modifiednucleotide comprises a radioactive modified nucleotide or a modifiednucleotide containing a fluorophore.
 27. The method of claim 24, whereinthe modified nucleotide comprises a modified ribonucleotide.
 28. Themethod of claim 24, wherein the modified nucleotide comprises a labeled,modified ribonucleotide.
 29. The method of claim 28, wherein thelabeled, modified ribonucleotide comprises a radioactive ribonucleotideor a ribonucleotide containing a fluorophore.
 30. The method of claim 1,wherein hybridizing the fragments to an oligonucleotide comprises usingan oligonucleotide that is immobilized on a solid support.
 31. Themethod of claim 1, wherein the incorporated detectable label comprisesfluorescence resonance energy transfer (FRET).